Raw data analysis may often be difficult and boring, especially when dealing with complex data sets. However, data visualization is a powerful and at the same time easy way to understand and analyze such data sets because it turns tons of numbers into meaningful insights that can be used to make better decisions. In this blog post, I'll try to prove that with a small real-world example.
This post was made with the help of the AI tools like
In February, I noticed a list of utility debts for my building:
The list of utility debts.
At first glance, the list shows the debt per apartment, the total debt at the end, and nothing more. Being a software engineer with almost a decade of experience, I wanted to create a visual representation of the data for better understanding.
🛠️ Some Technical Details
I want to omit technical topics and only mention that I’ll use
is built on top of Matplotlib that provides a high-level interface for creating informative and attractive statistical graphics, including heatmaps, distribution plots, time series visualizations, and many more.
doesn’t work here, because 80% of debt come from about 41% of people.
Let’s create a better and more advanced chart that illustrates the distribution of debt across 3 sections in the building:
The distribution of debt across 3 sections in the building
Despite the fact that the above chart has lots of numbers, I decided to use all of them. At first, it shows the distribution of debt across 3 sections. It also includes the overall picture - the total debt and average debt per apartment in the building. In general, the numbers are shameful, but the second section is slightly better than the others.
Additionally, I like the idea of using separate donuts in the form of a progress bar to show the distribution of debt across 3 sections:
In general, pie charts are better suited for data that has fewer categories (usually no more than 5) and where the difference in the proportions are large enough to be easily discernible.
In regard to the first section, I would like to mention that there are 21 apartments that are free of debt:
21 apts. without debt in the 1st section; 20 in the 2nd; 15 in the 3rd.
So, which section has more good citizens? I don’t see a winner here. I think the result of the above analysis can be illustrated by the following image:
Last but not least, I made the final visualization that shows the distribution of debt per floor in each section:
Additionally, it contains mean numbers for debt per apartment and per floor in each section.
One of the advantages of using Python for data visualization is the flexibility and customization it offers. For instance, I can apply gradients to represent changes in debt values, making it easier to spot big debtors:
People have to pay their utilities. It's essential payment. Please don't be like my neighbors.