Clearer Decision-Making with PCA
Share
Explore

# Theory

Principal Component Analysis is essentially a change of basis for your dataset. Say your dataset has N variables and P samples. Each sample of your dataset has a natural representation as a point in a space of N dimensions, with its values being the coordinates in the basis {(1, 0...0), (0, 1, 0...0), ..., (0...0, 1)}.
PCA finds a new basis using linear combinations of this basis. This new basis is made of the principal components. There are thus N principal components.
PCA uses two steps to find these linear combinations:
1. Calculate the covariance matrix K for the variables. For each pair of variables, K stores their joint variability over the samples or covariance:
Samples
0
1
2
3
4
Sample
Variable A
Variable B
Variable C
s1
1
2
3
s2
4
5
6
s3
7
8
9
s4
10
11
12
There are no rows in this table
Covariance matrix
0
1
2
3
Name
A
B
C
A
cov(A,A)
cov(A,B)
cov(A,C)
B
cov(B,A)
cov(B,B)
cov(B,C)
C
cov(C,A)
cov(C,B)
cov(C,C)
There are no rows in this table

cov(A, B) = [ (4-1)x (5-2) + (7-1)x (8-2) + (10-1)x(11-2) + (1-4)x (2-5) + (7-4)x (8-5) + (10-4)x(11-5) + (1-7)x (2-8) + (4-7)x (5-8) + (10-7)x(11-8) + (1-10)x(2-11) + (4-10)x(5-11) + (7-10)x(8-11) ]/32
Thus the covariance matrix K is a NxN matrix.

2. Calculate the eigenvectors (V1, V2, ... VN) and eigenvalues (a1, a2,... aN) of the covariance matrix by solving:
K x V = a V
The eigenvectors are the principal components and form the new basis (once ordered by decreasing eigenvalue). These vectors are orthogonal to each others.
The coordinates of the eigenvectors in the old basis are what the Loadings sync table returns. They form the change-of-basis matrix.
The coordinates of the samples in the new basis are what the Principal Components sync table returns.
The last row of the Loadings sync table is the percentage explained, defined as
{a1/(a1+...+aN), (a1+a2)/(a1+...+aN), ...., 1}

# Practice

Below are the sync tables we used for our examples. Coda allows only one instance of each table per doc, so both Principal Components and Loadings contain all our datasets. They are separated using the Group column, see .
Principal Components
5
Not synced yet
Group
Label
Pc1
Pc2
1
Drinking
France
-1.395
-1.619
2
Drinking
Italy
-1.760
-0.808
3
Drinking
Switzerland
-1.102
-0.372
4
Drinking
Austria
-0.332
1.120
5
Drinking
UK
0.162
0.931
6
Drinking
USA
0.445
0.405
7
Drinking
Russia
3.409
-2.056
8
Drinking
Czech Republic
1.403
2.076
9
Drinking
Japan
-0.722
-0.126
10
Drinking
Mexico
-0.108
0.448
11
MovieReviews
Thor: Love and Thunder
1.340
-1.202
12
MovieReviews
The Northman
1.794
0.252
13
MovieReviews
Top Gun: Maverick
-0.002
1.339
14
MovieReviews
Minions: The Rise of Gru
-2.815
-0.558
15
MovieReviews
Lightyear
0.014
-1.020
16
MovieReviews
The Batman
-0.332
1.188
17
HeightWeight
1
-1.632
-0.051
18
HeightWeight
2
1.849
-0.753
19
HeightWeight
3
2.054
0.997
20
HeightWeight
4
0.992
0.795
21
HeightWeight
5
0.951
1.068
22
HeightWeight
6
0.042
-0.505
23
HeightWeight
7
1.518
0.169
24
HeightWeight
8
1.297
-0.205
25
HeightWeight
9
-0.896
-0.860
26
HeightWeight
10
-0.814
0.039
27
HeightWeight
11
-0.518
0.545
28
HeightWeight
12
-0.894
-0.653
29
HeightWeight
13
0.032
-0.223
30
HeightWeight
14
-0.584
0.021
31
HeightWeight
15
-0.538
-0.778
32
HeightWeight
16
1.900
-0.389
33
HeightWeight
17
-0.408
0.678
34
HeightWeight
18
1.186
0.676
35
HeightWeight
19
1.827
-0.564
36
HeightWeight
20
-0.487
0.111
37
HeightWeight
21
0.787
0.875
38
HeightWeight
22
1.304
0.626
39
HeightWeight
23
-3.362
-0.105
40
HeightWeight
24
0.306
-0.037
41
HeightWeight
25
0.748
0.981
42
HeightWeight
26
-0.122
0.417
43
HeightWeight
27
1.952
-0.155
44
HeightWeight
28
0.088
0.423
45
HeightWeight
29
-1.634
-0.599
46
HeightWeight
30
-1.703
0.126
47
HeightWeight
31
-0.842
-1.986
48
HeightWeight
32
-1.162
0.397
49
HeightWeight
33
-0.132
-0.037
50
HeightWeight
34
1.498
-0.434
51
HeightWeight
35
2.164
-0.642
52
HeightWeight
36
0.550
-0.369
53
HeightWeight
37
0.443
1.281
54
HeightWeight
38
-0.460
-0.249
55
HeightWeight
39
0.193
0.295
56
HeightWeight
40
-2.634
0.209
57
HeightWeight
41
0.057
-0.395
58
HeightWeight
42
-1.149
0.870
59
HeightWeight
43
1.359
0.113
60
HeightWeight
44
0.607
0.593
61
HeightWeight
45
-1.946
-0.510
62
HeightWeight
46
0.353
-0.172
63
HeightWeight
47
0.701
1.481
64
HeightWeight
48
-0.706
-0.524
65
HeightWeight
49
1.651
0.288
66
HeightWeight
50
0.871
0.040
67
HeightWeight
51
1.885
0.456
68
HeightWeight
52
-0.279
0.173
69
HeightWeight
53
0.743
-0.949
70
HeightWeight
54
-0.261
-1.100
71
HeightWeight
55
-1.157
0.716
72
HeightWeight
56
2.035
0.409
73
HeightWeight
57
2.592
0.799
74
HeightWeight
58
-0.464
0.564
75
HeightWeight
59
-1.044
0.115
76
HeightWeight
60
0.240
0.539
77
HeightWeight
61
-0.439
0.617
78
HeightWeight
62
0.993
0.228
79
HeightWeight
63
0.278
0.022
80
HeightWeight
64
-0.248
0.437
81
HeightWeight
65
1.521
-0.564
82
HeightWeight
66
-0.943
-1.140
83
HeightWeight
67
1.306
0.497
84
HeightWeight
68
0.247
0.407
85
HeightWeight
69
-1.667
-1.135
86
HeightWeight
70
-0.359
0.064
87
HeightWeight
71
-0.853
1.100
88
HeightWeight
72
1.455
-0.651
89
HeightWeight
73
1.497
0.061
90
HeightWeight
74
-2.775
-0.108
91
HeightWeight
75
0.186
-0.033
92
HeightWeight
76
-0.989
0.170
93
HeightWeight
77
0.822
0.523
94
HeightWeight
78
-0.561
1.239
95
HeightWeight
79
-0.041
-1.331
96
HeightWeight
80
-0.358
-0.198
97
HeightWeight
81
0.689
0.186
98
HeightWeight
82
-0.741
0.112
99
HeightWeight
83
2.430
0.899
100
HeightWeight
84
-0.507
0.710
101
HeightWeight
85
0.567
-0.345
102
HeightWeight
86
1.154
0.294
103
HeightWeight
87
0.593
1.023
104
HeightWeight
88
1.038
-0.522
105
HeightWeight
89
0.452
-1.173
106
HeightWeight
90
0.356
0.151
107
HeightWeight
91
0.617
0.486
108
HeightWeight
92
1.678
0.009
109
HeightWeight
93
2.082
-0.499
110
HeightWeight
94
-0.436
-1.347
111
HeightWeight
95
1.138
-0.400
112
HeightWeight
96
1.222
-0.681
113
HeightWeight
97
-1.030
0.180
114
HeightWeight
98
-1.891
1.403
115
HeightWeight
99
-0.376
0.484
116
HeightWeight
100
-0.352
-1.037
117
HeightWeight
101
-2.608
-0.363
118
HeightWeight
102
-0.128
0.498
119
HeightWeight
103
0.557
0.266
120
HeightWeight
104
-2.542
-0.837
121
HeightWeight
105
-0.818
-0.679
122
HeightWeight
106
-0.195
-0.246
123
HeightWeight
107
-0.352
-0.141
124
HeightWeight
108
-0.641
0.554
125
HeightWeight
109
0.614
-0.472
126
HeightWeight
110
-1.500
1.496
127
HeightWeight
111
-0.389
-0.221
128
HeightWeight
112
0.788
-0.742
129
HeightWeight
113
1.078
-0.555
130
HeightWeight
114
-1.635
-0.177
131
HeightWeight
115
0.444
-1.057
132
HeightWeight
116
0.319
-0.162
133
HeightWeight
117
0.636
0.009
134
HeightWeight
118
1.182
-0.137
135
HeightWeight
119
-1.574
0.240
136
HeightWeight
120
0.893
0.251
137
HeightWeight
121
-1.217
0.349
138
HeightWeight
122
-1.346
-0.807
139
HeightWeight
123
0.598
-0.474
140
HeightWeight
124
0.838
0.116
141
HeightWeight
125
-1.207
-1.200
142
HeightWeight
126
-1.078
0.613
143
HeightWeight
127
-1.575
-0.519
144
HeightWeight
128
-0.475
-0.461
145
HeightWeight
129
1.112
0.427
146
HeightWeight
130
1.234
-0.435
147
HeightWeight
131
0.433
0.200
148
HeightWeight
132
1.401
-0.990
149
HeightWeight
133
-0.270
-1.109
150
HeightWeight
134
-1.497
1.091
151
HeightWeight
135
1.202
-1.094
152
HeightWeight
136
0.615
0.418
153
HeightWeight
137
-1.366
0.114
154
HeightWeight
138
-0.210
0.162
155
HeightWeight
139
3.597
-0.740
156
HeightWeight
140
0.928
-0.552
157
HeightWeight
141
1.104
-0.041
158
HeightWeight
142
-1.826
0.192
159
HeightWeight
143
-0.090
-0.134
160
HeightWeight
144
0.163
-0.121
161
HeightWeight
145
-2.195
-0.177
162
HeightWeight
146
-0.523
-0.749
163
HeightWeight
147
0.617
0.165
164
HeightWeight
148
-2.104
0.410
165
HeightWeight
149
0.486
-0.068
166
HeightWeight
150
0.966
0.280
167
HeightWeight
151
0.472
-0.687
168
HeightWeight
152
0.465
0.865
169
HeightWeight
153
-0.914
0.157
170
HeightWeight
154
0.057
1.471
171
HeightWeight
155
2.199
-1.074
172
HeightWeight
156
-2.640
0.144
173
HeightWeight
157
2.981
0.772
174
HeightWeight
158
-1.254
-0.926
175
HeightWeight
159
2.167
-0.800
176
HeightWeight
160
-1.631
0.293
177
HeightWeight
161
0.810
1.444
178
HeightWeight
162
-2.322
0.272
179
HeightWeight
163
1.217
0.182
180
HeightWeight
164
-0.276
-0.590
181
HeightWeight
165
-0.898
1.026
182
HeightWeight
166
-0.294
0.296
183
HeightWeight
167
-0.549
-0.869
184
HeightWeight
168
-0.748
0.462
185
HeightWeight
169
-0.293
0.261
186
HeightWeight
170
-1.657
-0.229
187
HeightWeight
171
0.267
-0.812
188
HeightWeight
172
-0.171
-0.185
189
HeightWeight
173
-0.221
-0.083
190
HeightWeight
174
-1.428
0.518
191
HeightWeight
175
2.857
-1.429
192
HeightWeight
176
-1.739
-0.908
193
HeightWeight
177
-0.633
0.139
194
HeightWeight
178
-1.298
0.313
195
HeightWeight
179
-1.057
0.379
196
HeightWeight
180
-0.970
0.028
197
HeightWeight
181
0.155
0.767
198
HeightWeight
182
-1.408
0.108
199
HeightWeight
183
-1.493
-0.641
200
HeightWeight
184
0.110
-0.465
201
HeightWeight
185
0.197
1.451
202
HeightWeight
186
-0.611
0.839
203
HeightWeight
187
0.268
-0.191
204
HeightWeight
188
-0.868
0.123
205
HeightWeight
189
-0.332
0.382
206
HeightWeight
190
-1.475
-1.570
207
HeightWeight
191
1.867
0.234
208
HeightWeight
192
-1.847
-0.076
209
HeightWeight
193
0.155
0.512
210
HeightWeight
194
0.934
1.327
211
HeightWeight
195
1.709
0.463
212
HeightWeight
196
-1.161
0.406
213
HeightWeight
197
-1.347
-0.006
214
HeightWeight
198
0.169
-0.042
215
HeightWeight
199
0.040
-0.011
216
HeightWeight
200
1.293
-1.215
There are no rows in this table
7
Not synced yet
Group
Variable Name
Principal Component1
Principal Component2
Principal Component3
Principal Component4
Principal Component5
Principal Component6
1
Drinking
Spirits
0.35
-0.57
-0.214
-0.635
-0.329
0
2
Drinking
Wine
-0.45
-0.38
-0.618
0.448
-0.276
0
3
Drinking
Beer
0.07
0.72
-0.425
-0.207
-0.497
0
4
Drinking
Life Expectancy
-0.58
0.09
-0.269
-0.567
0.506
0
5
Drinking
Heart Disease Rate
0.58
0.04
-0.565
0.177
0.559
0
6
Drinking
Drinking Percentage Explained
0.46
0.78
0.898
0.983
1.000
1
7
MovieReviews
Variety
0.58
-0.20
0.650
0.449
0.000
0
8
MovieReviews
The New York Times
0.57
0.22
-0.698
0.369
0.000
0
9
MovieReviews
Vanity Fair
-0.44
0.63
0.162
0.617
0.000
0
10
MovieReviews
RogerEbert.com
-0.37
-0.72
-0.253
0.531
0.000
0
11
MovieReviews
MovieReviews Percentage Explained
0.65
0.96
1.000
1.000
1.000
1
12
HeightWeight
Height
0.71
-0.71
0.000
0.000
0.000
0
13
HeightWeight
Weight
0.71
0.71
0.000
0.000
0.000
0
14
HeightWeight
HeightWeight Percentage Explained
0.78
1.00
1.000
1.000
1.000
1
There are no rows in this table

Variable Names Drinking
0
Name
Description
1
Spirits
in litres, per year
2
Wine
in litres, per year
3
Beer
in litres, per year
4
Life Expectancy
in years
5
Heart Disease Rate
per 100.000 persons
There are no rows in this table
Variable Names Movie Reviews
0
Name
Description
1
Variety
2
The New York Times
3
Vanity Fair
4
RogerEbert.com
There are no rows in this table
Variable Names Height vs Weight
0
Name
Description
1
Height
in inches
2
Weight
in pounds
There are no rows in this table

Bonus for reaching the end of the last page!
(video posted on by u/PR0CR45T184T0R)