Raw data analysis may often be difficult and boring, especially when dealing with complex data sets. However, data visualization is a powerful and easy way to understand and analyze such data sets because it turns tons of numbers into meaningful insights that can be used to make better decisions. In this post, I'll try to prove that with a small real-world example.
🔎 Problem Statement
In February, I noticed a list of utility debts for my building:
At first glance, the list shows the debt per apartment, the total debt at the end, and nothing more. Being a software engineer with almost a decade of experience, I wanted to create a visual representation of the data for better understanding.
🛠️ Some Technical Details
The only technical information I want to highlight is that I’ll use
is built on top of Matplotlib that provides a high-level interface for creating informative and attractive statistical graphics, including heatmaps, distribution plots, time series visualizations, and many more.
doesn’t work here, because 80% of debt comes from about 41% of people.
Let’s create a better and more advanced chart that illustrates the distribution of debt across 3 sections in the building:
Despite the fact that the above chart has lots of numbers, I decided to use all of them. At first, it shows the distribution of debt across 3 sections. It also includes the overall picture - the total debt and average debt per apartment in the building. Generally speaking, the numbers are shameful, but the second section is slightly better than the others.
I also like the idea of using separate donuts in the form of a progress bar to show the distribution of debt across 3 sections:
In general, pie charts are better suited for data that has fewer categories (usually no more than 5) and where the difference in the proportions are large enough to be easily discernible.
In regard to the first section, I would like to mention that there are 21 apartments that are free of debt:
So, which section has more good citizens? I don’t see a winner here. I think the result of the above analysis can be illustrated by the following image:
Last but not least, I made the final visualization that shows the distribution of debt per floor in each section:
Additionally, it contains mean debt per apartment and per floor in each section.
One of the advantages of using Python for data visualization is the flexibility and customization it offers. For instance, I can apply gradients to represent changes in debt values, making it easier to spot big debtors:
Final World
People have to pay their utilities. It's essential payment. Please don't be like my neighbors.