Clearer Decision-Making with PCA

Choose your Drink

How to Use the PCA Pack

Implementation Notes

Under The Hood

Explore

Clearer Decision-Making with PCA

Learn how to generate the best chart of your multidimensional data

Sebastien Tien

In our daily digital life, we are provided with plenty of information, meant to help us make better decisions: lists of products and services, and for each, a complete data sheet, along with a list of reviews, and for each, more information about the reviewer, and so on... This amount of data can feel overwhelming for our decision-making: which information is relevant? what best distinguishes these two products?

Enter Principal Component Analysis (PCA), a process routinely used in data exploration, machine learning, and data compression. PCA offers a different look at your data, providing you with the best visual summary possible.

This doc will give you an intuitive grasp of PCA and the tools to apply it to your decision-making, via a

📦 dedicated Pack⁠

So What is PCA?

⁠

⁠

Let’s imagine you’d like to understand how the leaves on a tree are distributed.

We could measure the position and orientation of each leave, but it’s a lot of leaves, and a lot of data!

Alternatively, we could look at the tree’s shadow, which provides a good indication of this distribution. It’s simpler to look at, even if it’s not the tree itself.

We could do even better: we could choose the position of the sun which puts most distance between the leaves’ shadows — which separates them most.

In a nutshell, this is what PCA can do for us: it determines the best space of lower dimensions for our dataset. “Best” is the sense that it maximises the dispersion of its elements.

(image posted on
Reddit⁠
by u/DanRG02)

PCA is incredibly useful:

it computes at once the best representation of our dataset in various forms: as a ranking (1 dimension), as a map (2 dimensions), and as a cloud (3 dimensions) and so on.

PCA tells us how much of the original dataset is actually explained by each form.

PCA factors in any correlation existing between your variables.

PCA works with as many samples and variables as needed — in the hundreds is quite common.

Decision-Making using PCA

PCA does not take the decision for us, but gives us a good grasp of the options available. Say you want to decide on which movie to watch next, based on their reviews:

Movie Reviews

Movie Reviews

Name

Variety

The New York Times

Vanity Fair

RogerEbert.com

Lightyear

50%

60%

35%

55%

Minions: The Rise of Gru

30%

40%

65%

The Batman

45%

60%

65%

30%

The Northman

60%

75%

35%

30%

Thor: Love and Thunder

70%

60%

25%

45%

Top Gun: Maverick

50%

60%

65%

25%

There are no rows in this table

⁠

(Movie buffs, beware: The scores have been changed to make sure each movie has an average score of 50%).

PCA will find the position of each movie along an axis called first principal component:

Movie Reviews along the first Principal Component

Movie Reviews along the first Principal Component

Not synced yet

⁠

For our next trip to the silver screen, based on these reviews, the starker choice is between “The Northman” and “Minions”. And if the first one was up your alley, you’ll most likely appreciate “Thor” too. Same for “Top Gun” and “Lightyear”.

Of course, by summarising all these reviews along a single axis, we’re losing information. But PCA also tells us we’re keeping

65.27%

⁠

of it.

If you can’t make your mind between “Top Gun” and “Lightyear”, let’s have a look to the position along the second principal component. Since we have two coordinates, we can display a map, along these two components:

Movie Reviews along the first two Principal Components

Movie Reviews along the first two Principal Components

Not synced yet

⁠

(The components are not shown on the diagram).

If you liked “The Batman” as much as the reviewers, it’s likely that you’ll prefer “Top Gun” over “Lightyear”.

By using the first two components, PCA tells us that we’re keeping

95.59%

⁠

of the information of these reviews. So we can safely decide using only this map!

💡 Now, the curious minds will ask: in the chart above, what does it mean to move from left to right, or from bottom to top?

This is what’s next: how to

interpret the Principal Components⁠

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.