Clearer Decision-Making with PCA

Choose your Drink

How to Use the PCA Pack

Implementation Notes

Under The Hood

Explore

How to Use the PCA Pack

Install

this Pack⁠

like any other:

type “/pack”

Search for PCA

Follow the > on the right

press

Add to doc

⁠

PCA calculates values (matrices really) from a dataset of samples — each having values for each variable. In Coda, a table naturally represents the dataset: each sample is a row, each variable is a column.

This pack provides you with two sync tables:

Principal Components, that gives you the first two Principal Components values for your dataset,

Loadings, that gives you the weight of each variable in each Principal Component.

To use the Principal Components sync table

Drag the table to your document from the Pack tab on the right,

Press on the table the button

Choose what to sync

⁠

In the Label entry field, select the column with the names of your samples (e.g. MovieReviews.MovieName, DrinkingHabits.CountryName, etc),

In the Variable1 entry field, select the column with the values for your first variable (e.g. MovieReviews.Vanity, or DrinkingHabits.Spirits),

In the Variable2 entry field, select the column with the values for your second variable (e.g. MovieReviews.TheNewYorkTimes, or DrinkingHabits.Wine),

Add as many Variables as needed (up to 6) using

Add criteria

⁠

Once all variables have been added, press

Start sync

⁠

The sync table will return a single column “Principal Components”. Bring your pointer to the column to display the pulldown menu showing a small jigsaw piece and choose “PCA Pack - Principal Components options”. A dialog box opens, click there on “Related columns” and

Add

⁠

each one of them,

Now the sync table shows all samples as rows, with their own label, and their value along the first Principal Component (column Pc1) and the second PC (column Pc2):

Drinking Habits by Principal Components

Drinking Habits by Principal Components

Label

Pc1

Pc2

France

-1.395

-1.619

Italy

-1.760

-0.808

Switzerland

-1.102

-0.372

Austria

-0.332

1.120

0.162

0.931

USA

0.445

0.405

Russia

3.409

-2.056

Czech Republic

1.403

2.076

Japan

-0.722

-0.126

Mexico

-0.108

0.448

No results from filter

⁠

The sync table can be displayed as a Scatter chart, with PC1 as horizontal axis, PC2 as vertical axis and segmented by Label.

To use the Loadings sync table

Drag the table to your document from the Pack tab on the right,

Press on the table the button

Choose what to sync

⁠

In the VariableNames entry field, input the list of how your variables are named, e.g.

=List(”Spirits”, “Wine”, “Beer”, “Life Expectancy”, “Heart Disease Rate”)

Alternatively, create a table with a Text column, with each row listing one variable name, and in the VariableNames entry field, select this Text column, e.g.:

Variable Names

Variable Names

Name

Spirits

Wine

Beer

Life Expectancy

Heart Disease Rate

There are no rows in this table

⁠

In the Variable1 entry field, select the column with the values for your first variable (e.g. MovieReviews.Vanity, or DrinkingHabits.Spirits)

In the Variable2 entry field, select the column with the values for your second variable (e.g. MovieReviews.TheNewYorkTimes, or DrinkingHabits.Wine)

Add as many Variables as needed (up to 6) using

Add criteria

⁠

Once all variables have been added, press

Start sync

⁠

The sync table will return a single column “Loadings”. Bring your pointer to the column to display the pulldown menu showing a small jigsaw piece and choose “PCA Pack - Loadings options”. A dialog box opens, click there on “Related columns” and

Add

⁠

each one of them.

Now the sync table shows all variables as rows, with their own label, and their weight for each principal component:

Drinking Habits: Loadings

Drinking Habits: Loadings

Variable Name

Principal Component1

Principal Component2

Principal Component3

Principal Component4

Principal Component5

Principal Component6

Spirits

0.35

-0.57

-0.214

-0.635

-0.329

Wine

-0.45

-0.38

-0.618

0.448

-0.276

Beer

0.07

0.72

-0.425

-0.207

-0.497

Life Expectancy

-0.58

0.09

-0.269

-0.567

0.506

Heart Disease Rate

0.58

0.04

-0.565

0.177

0.559

Drinking Percentage Explained

0.46

0.78

0.898

0.983

1.000

No results from filter

⁠

The table reads as follows:

Principal Component 1 =0.35 * Spirits - 0.45 * Wine + 0.07 * Beer - 0.58 * Life Expectancy + 0.58 Heart Disease Rate

The last row of the sync table gives you the percentage of data explained by using respectively the first PC, the first two PCs, the first three PCs, etc. You can retrieve the first one with this formula (replacing XXX by the corresponding name):

Format({1}%, 100 * Loadings.Filter(Variable Name =”XXX Percentage Explained”)).Principal Component1)

To display only the Loading values, you need to filter out the last row: in the Filter tab on the right, press

Add filter

⁠

and select “Variable Name” “does not contain” “Percentage Explained”

To Use Two Datasets in a Doc

Coda allows only one instance of a sync table per doc. So the same sync table will be used for all your datasets. To add a second dataset to the Principal Components or Loadings sync table:

On the sync table, press Options

On the tab on the right, choose the PCA PackPress the

Add another sync

⁠

Select the data for your second analysis as you did for the first dataset

Once done, give your dataset a name by using

Add criteria

⁠

and selecting “Group”. In the entry field for the Group criteria, enter a name of your choosing, e.g. “MovieReviews” or “Drinking”

It’s advisable to also give a “Group” criteria to your first dataset

Press

Start sync

⁠

To use the results of the PCA for your second dataset, create a view of the sync table and

Add filter

⁠

selecting “Group” “is equal to” the group name you chose before.

You can use more than two datasets by repeating the steps above.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.