Share
Explore

icon picker
Data Visualization: From Data to Insights

Last edited 342 days ago by Slava Melanko
39syng.png

Intro

Raw data analysis may often be difficult and boring, especially when dealing with complex data sets. However, data visualization is a powerful and easy way to understand and analyze such data sets because it turns tons of numbers into meaningful insights that can be used to make better decisions. In this post, I'll try to prove that with a small real-world example.

🔎 Problem Statement

In February, I noticed a list of utility debts for my building:

debt-list.png

At first glance, the list shows the debt per apartment, the total debt at the end, and nothing more. Being a software engineer with almost a decade of experience, I wanted to create a visual representation of the data for better understanding.

🛠️ Some Technical Details

The only technical information I want to highlight is that I’ll use and the programming language along with the following libraries:
is built on top of that provides convenient data structures and functions for data manipulation, analysis, and visualization.
provides a range of customizable plotting options for creating static, animated, and interactive visualizations.
is built on top of Matplotlib that provides a high-level interface for creating informative and attractive statistical graphics, including heatmaps, distribution plots, time series visualizations, and many more.
The entire code is in .


📊 Data Visualization

The first and easy thing to do is to find the “top 10” debtors, min and max, or the average debt per apartment, e.g.

image.png

Also, check out the distribution of debt, for instance:

debt_hist.png

BTW the doesn’t work here, because 80% of debt comes from about 41% of people.
Let’s create a better and more advanced chart that illustrates the distribution of debt across 3 sections in the building:

image.png

Despite the fact that the above chart has lots of numbers, I decided to use all of them. At first, it shows the distribution of debt across 3 sections. It also includes the overall picture - the total debt and average debt per apartment in the building. Generally speaking, the numbers are shameful, but the second section is slightly better than the others.
I also like the idea of using separate donuts in the form of a progress bar to show the distribution of debt across 3 sections:

image.png

In general, pie charts are better suited for data that has fewer categories (usually no more than 5) and where the difference in the proportions are large enough to be easily discernible.
In regard to the first section, I would like to mention that there are 21 apartments that are free of debt:

image.png

So, which section has more good citizens? I don’t see a winner here. I think the result of the above analysis can be illustrated by the following image:

cOI9pOV.png

Last but not least, I made the final visualization that shows the distribution of debt per floor in each section:

image.png

Additionally, it contains mean debt per apartment and per floor in each section.
One of the advantages of using Python for data visualization is the flexibility and customization it offers. For instance, I can apply gradients to represent changes in debt values, making it easier to spot big debtors:

image.png

Final World

People have to pay their utilities. It's essential payment. Please don't be like my neighbors.

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.