Explore

Data Visualization: From Data to Insights

Slava Melanko

Last edited 499 days ago by Slava Melanko

⁠

Intro

Raw data analysis may often be difficult and boring, especially when dealing with complex data sets. However, data visualization is a powerful and easy way to understand and analyze such data sets because it turns tons of numbers into meaningful insights that can be used to make better decisions. In this post, I'll try to prove that with a small real-world example.

🔎 Problem Statement

In February, I noticed a list of utility debts for my building:

⁠

At first glance, the list shows the debt per apartment, the total debt at the end, and nothing more. Being a software engineer with almost a decade of experience, I wanted to create a visual representation of the data for better understanding.

🛠️ Some Technical Details

The only technical information I want to highlight is that I’ll use

Jupyter Notebook⁠

and the

Python⁠

programming language along with the following libraries:

⁠

Pandas⁠

is built on top of

NumPy⁠

that provides convenient data structures and functions for data manipulation, analysis, and visualization.

⁠

Matplotlib⁠

provides a range of customizable plotting options for creating static, animated, and interactive visualizations.

⁠

Seaborn⁠

is built on top of Matplotlib that provides a high-level interface for creating informative and attractive statistical graphics, including heatmaps, distribution plots, time series visualizations, and many more.

The entire code is in

my GitHub repository⁠

📊 Data Visualization

The first and easy thing to do is to find the “top 10” debtors, min and max, or the average debt per apartment, e.g.

⁠

Also, check out the distribution of debt, for instance:

⁠

BTW the

Pareto principle⁠

doesn’t work here, because 80% of debt comes from about 41% of people.

Let’s create a better and more advanced chart that illustrates the distribution of debt across 3 sections in the building:

⁠

Despite the fact that the above chart has lots of numbers, I decided to use all of them. At first, it shows the distribution of debt across 3 sections. It also includes the overall picture - the total debt and average debt per apartment in the building. Generally speaking, the numbers are shameful, but the second section is slightly better than the others.

I also like the idea of using separate donuts in the form of a progress bar to show the distribution of debt across 3 sections:

⁠

In general, pie charts are better suited for data that has fewer categories (usually no more than 5) and where the difference in the proportions are large enough to be easily discernible.

In regard to the first section, I would like to mention that there are 21 apartments that are free of debt:

⁠

So, which section has more good citizens? I don’t see a winner here. I think the result of the above analysis can be illustrated by the following image:

⁠

Last but not least, I made the final visualization that shows the distribution of debt per floor in each section:

⁠

Additionally, it contains mean debt per apartment and per floor in each section.

One of the advantages of using Python for data visualization is the flexibility and customization it offers. For instance, I can apply gradients to represent changes in debt values, making it easier to spot big debtors:

⁠

Final World

People have to pay their utilities. It's essential payment. Please don't be like my neighbors.

Intro

🔎 Problem Statement

🛠️ Some Technical Details

📊 Data Visualization

Final World

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.