Clearer Decision-Making with PCA

Explore

Under The Hood

Theory

Principal Component Analysis is essentially a change of basis for your dataset. Say your dataset has N variables and P samples. Each sample of your dataset has a natural representation as a point in a space of N dimensions, with its values being the coordinates in the basis {(1, 0...0), (0, 1, 0...0), ..., (0...0, 1)}.

PCA finds a new basis using linear combinations of this basis. This new basis is made of the principal components. There are thus N principal components.

PCA uses two steps to find these linear combinations:

1. Calculate the covariance matrix K for the variables. For each pair of variables, K stores their joint variability over the samples or covariance:

Samples

Samples

Sample

Variable A

Variable B

Variable C

There are no rows in this table

⁠

Covariance matrix

Covariance matrix

Name

cov(A,A)

cov(A,B)

cov(A,C)

cov(B,A)

cov(B,B)

cov(B,C)

cov(C,A)

cov(C,B)

cov(C,C)

There are no rows in this table

⁠

cov(A, B) = [ (4-1)x (5-2) + (7-1)x (8-2) + (10-1)x(11-2) + (1-4)x (2-5) + (7-4)x (8-5) + (10-4)x(11-5) + (1-7)x (2-8) + (4-7)x (5-8) + (10-7)x(11-8) + (1-10)x(2-11) + (4-10)x(5-11) + (7-10)x(8-11) ]/32

Thus the covariance matrix K is a NxN matrix.

2. Calculate the eigenvectors (V1, V2, ... VN) and eigenvalues (a1, a2,... aN) of the covariance matrix by solving:

K x V = a V

The eigenvectors are the principal components and form the new basis (once ordered by decreasing eigenvalue). These vectors are orthogonal to each others.

The coordinates of the eigenvectors in the old basis are what the Loadings sync table returns. They form the change-of-basis matrix.

The coordinates of the samples in the new basis are what the Principal Components sync table returns.

The last row of the Loadings sync table is the percentage explained, defined as

{a1/(a1+...+aN), (a1+a2)/(a1+...+aN), ...., 1}

Practice

Below are the sync tables we used for our examples. Coda allows only one instance of each table per doc, so both Principal Components and Loadings contain all our datasets. They are separated using the Group column, see

How to Use the PCA pack⁠

Principal Components

Principal Components

Group

Label

Pc1

Pc2

Drinking

France

-1.395

-1.619

Drinking

Italy

-1.760

-0.808

Drinking

Switzerland

-1.102

-0.372

Drinking

Austria

-0.332

1.120

Drinking

0.162

0.931

Drinking

USA

0.445

0.405

Drinking

Russia

3.409

-2.056

Drinking

Czech Republic

1.403

2.076

Drinking

Japan

-0.722

-0.126

Drinking

Mexico

-0.108

0.448

MovieReviews

Thor: Love and Thunder

1.340

-1.202

MovieReviews

The Northman

1.794

0.252

MovieReviews

Top Gun: Maverick

-0.002

1.339

MovieReviews

Minions: The Rise of Gru

-2.815

-0.558

MovieReviews

Lightyear

0.014

-1.020

MovieReviews

The Batman

-0.332

1.188

HeightWeight

-1.632

-0.051

HeightWeight

1.849

-0.753

HeightWeight

2.054

0.997

HeightWeight

0.992

0.795

HeightWeight

0.951

1.068

HeightWeight

0.042

-0.505

HeightWeight

1.518

0.169

HeightWeight

1.297

-0.205

HeightWeight

-0.896

-0.860

HeightWeight

-0.814

0.039

HeightWeight

-0.518

0.545

HeightWeight

-0.894

-0.653

HeightWeight

0.032

-0.223

HeightWeight

-0.584

0.021

HeightWeight

-0.538

-0.778

HeightWeight

1.900

-0.389

HeightWeight

-0.408

0.678

HeightWeight

1.186

0.676

HeightWeight

1.827

-0.564

HeightWeight

-0.487

0.111

HeightWeight

0.787

0.875

HeightWeight

1.304

0.626

HeightWeight

-3.362

-0.105

HeightWeight

0.306

-0.037

HeightWeight

0.748

0.981

HeightWeight

-0.122

0.417

HeightWeight

1.952

-0.155

HeightWeight

0.088

0.423

HeightWeight

-1.634

-0.599

HeightWeight

-1.703

0.126

HeightWeight

-0.842

-1.986

HeightWeight

-1.162

0.397

HeightWeight

-0.132

-0.037

HeightWeight

1.498

-0.434

HeightWeight

2.164

-0.642

HeightWeight

0.550

-0.369

HeightWeight

0.443

1.281

HeightWeight

-0.460

-0.249

HeightWeight

0.193

0.295

HeightWeight

-2.634

0.209

HeightWeight

0.057

-0.395

HeightWeight

-1.149

0.870

HeightWeight

1.359

0.113

HeightWeight

0.607

0.593

HeightWeight

-1.946

-0.510

HeightWeight

0.353

-0.172

HeightWeight

0.701

1.481

HeightWeight

-0.706

-0.524

HeightWeight

1.651

0.288

HeightWeight

0.871

0.040

HeightWeight

1.885

0.456

HeightWeight

-0.279

0.173

HeightWeight

0.743

-0.949

HeightWeight

-0.261

-1.100

HeightWeight

-1.157

0.716

HeightWeight

2.035

0.409

HeightWeight

2.592

0.799

HeightWeight

-0.464

0.564

HeightWeight

-1.044

0.115

HeightWeight

0.240

0.539

HeightWeight

-0.439

0.617

HeightWeight

0.993

0.228

HeightWeight

0.278

0.022

HeightWeight

-0.248

0.437

HeightWeight

1.521

-0.564

HeightWeight

-0.943

-1.140

HeightWeight

1.306

0.497

HeightWeight

0.247

0.407

HeightWeight

-1.667

-1.135

HeightWeight

-0.359

0.064

HeightWeight

-0.853

1.100

HeightWeight

1.455

-0.651

HeightWeight

1.497

0.061

HeightWeight

-2.775

-0.108

HeightWeight

0.186

-0.033

HeightWeight

-0.989

0.170

HeightWeight

0.822

0.523

HeightWeight

-0.561

1.239

HeightWeight

-0.041

-1.331

HeightWeight

-0.358

-0.198

HeightWeight

0.689

0.186

HeightWeight

-0.741

0.112

HeightWeight

2.430

0.899

HeightWeight

-0.507

0.710

HeightWeight

0.567

-0.345

HeightWeight

1.154

0.294

HeightWeight

0.593

1.023

HeightWeight

1.038

-0.522

HeightWeight

0.452

-1.173

HeightWeight

0.356

0.151

HeightWeight

0.617

0.486

HeightWeight

1.678

0.009

HeightWeight

2.082

-0.499

HeightWeight

-0.436

-1.347

HeightWeight

1.138

-0.400

HeightWeight

1.222

-0.681

HeightWeight

-1.030

0.180

HeightWeight

-1.891

1.403

HeightWeight

-0.376

0.484

HeightWeight

100

-0.352

-1.037

HeightWeight

101

-2.608

-0.363

HeightWeight

102

-0.128

0.498

HeightWeight

103

0.557

0.266

HeightWeight

104

-2.542

-0.837

HeightWeight

105

-0.818

-0.679

HeightWeight

106

-0.195

-0.246

HeightWeight

107

-0.352

-0.141

HeightWeight

108

-0.641

0.554

HeightWeight

109

0.614

-0.472

HeightWeight

110

-1.500

1.496

HeightWeight

111

-0.389

-0.221

HeightWeight

112

0.788

-0.742

HeightWeight

113

1.078

-0.555

HeightWeight

114

-1.635

-0.177

HeightWeight

115

0.444

-1.057

HeightWeight

116

0.319

-0.162

HeightWeight

117

0.636

0.009

HeightWeight

118

1.182

-0.137

HeightWeight

119

-1.574

0.240

HeightWeight

120

0.893

0.251

HeightWeight

121

-1.217

0.349

HeightWeight

122

-1.346

-0.807

HeightWeight

123

0.598

-0.474

HeightWeight

124

0.838

0.116

HeightWeight

125

-1.207

-1.200

HeightWeight

126

-1.078

0.613

HeightWeight

127

-1.575

-0.519

HeightWeight

128

-0.475

-0.461

HeightWeight

129

1.112

0.427

HeightWeight

130

1.234

-0.435

HeightWeight

131

0.433

0.200

HeightWeight

132

1.401

-0.990

HeightWeight

133

-0.270

-1.109

HeightWeight

134

-1.497

1.091

HeightWeight

135

1.202

-1.094

HeightWeight

136

0.615

0.418

HeightWeight

137

-1.366

0.114

HeightWeight

138

-0.210

0.162

HeightWeight

139

3.597

-0.740

HeightWeight

140

0.928

-0.552

HeightWeight

141

1.104

-0.041

HeightWeight

142

-1.826

0.192

HeightWeight

143

-0.090

-0.134

HeightWeight

144

0.163

-0.121

HeightWeight

145

-2.195

-0.177

HeightWeight

146

-0.523

-0.749

HeightWeight

147

0.617

0.165

HeightWeight

148

-2.104

0.410

HeightWeight

149

0.486

-0.068

HeightWeight

150

0.966

0.280

HeightWeight

151

0.472

-0.687

HeightWeight

152

0.465

0.865

HeightWeight

153

-0.914

0.157

HeightWeight

154

0.057

1.471

HeightWeight

155

2.199

-1.074

HeightWeight

156

-2.640

0.144

HeightWeight

157

2.981

0.772

HeightWeight

158

-1.254

-0.926

HeightWeight

159

2.167

-0.800

HeightWeight

160

-1.631

0.293

HeightWeight

161

0.810

1.444

HeightWeight

162

-2.322

0.272

HeightWeight

163

1.217

0.182

HeightWeight

164

-0.276

-0.590

HeightWeight

165

-0.898

1.026

HeightWeight

166

-0.294

0.296

HeightWeight

167

-0.549

-0.869

HeightWeight

168

-0.748

0.462

HeightWeight

169

-0.293

0.261

HeightWeight

170

-1.657

-0.229

HeightWeight

171

0.267

-0.812

HeightWeight

172

-0.171

-0.185

HeightWeight

173

-0.221

-0.083

HeightWeight

174

-1.428

0.518

HeightWeight

175

2.857

-1.429

HeightWeight

176

-1.739

-0.908

HeightWeight

177

-0.633

0.139

HeightWeight

178

-1.298

0.313

HeightWeight

179

-1.057

0.379

HeightWeight

180

-0.970

0.028

HeightWeight

181

0.155

0.767

HeightWeight

182

-1.408

0.108

HeightWeight

183

-1.493

-0.641

HeightWeight

184

0.110

-0.465

HeightWeight

185

0.197

1.451

HeightWeight

186

-0.611

0.839

HeightWeight

187

0.268

-0.191

HeightWeight

188

-0.868

0.123

HeightWeight

189

-0.332

0.382

HeightWeight

190

-1.475

-1.570

HeightWeight

191

1.867

0.234

HeightWeight

192

-1.847

-0.076

HeightWeight

193

0.155

0.512

HeightWeight

194

0.934

1.327

HeightWeight

195

1.709

0.463

HeightWeight

196

-1.161

0.406

HeightWeight

197

-1.347

-0.006

HeightWeight

198

0.169

-0.042

HeightWeight

199

0.040

-0.011

HeightWeight

200

1.293

-1.215

There are no rows in this table

⁠

Loadings

Loadings

Group

Variable Name

Principal Component1

Principal Component2

Principal Component3

Principal Component4

Principal Component5

Principal Component6

Drinking

Spirits

0.35

-0.57

-0.214

-0.635

-0.329

Drinking

Wine

-0.45

-0.38

-0.618

0.448

-0.276

Drinking

Beer

0.07

0.72

-0.425

-0.207

-0.497

Drinking

Life Expectancy

-0.58

0.09

-0.269

-0.567

0.506

Drinking

Heart Disease Rate

0.58

0.04

-0.565

0.177

0.559

Drinking

Drinking Percentage Explained

0.46

0.78

0.898

0.983

1.000

MovieReviews

Variety

0.58

-0.20

0.650

0.449

0.000

MovieReviews

The New York Times

0.57

0.22

-0.698

0.369

0.000

MovieReviews

Vanity Fair

-0.44

0.63

0.162

0.617

0.000

MovieReviews

RogerEbert.com

-0.37

-0.72

-0.253

0.531

0.000

MovieReviews

MovieReviews Percentage Explained

0.65

0.96

1.000

HeightWeight

Height

0.71

-0.71

0.000

HeightWeight

Weight

0.71

0.000

HeightWeight

HeightWeight Percentage Explained

0.78

1.00

1.000

There are no rows in this table

⁠

Variable Names Drinking

Variable Names Drinking

Name

Description

Spirits

in litres, per year

Wine

in litres, per year

Beer

in litres, per year

Life Expectancy

in years

Heart Disease Rate

per 100.000 persons

There are no rows in this table

⁠

Variable Names Movie Reviews

Variable Names Movie Reviews

Name

Description

Variety

The New York Times

Vanity Fair

RogerEbert.com

There are no rows in this table

⁠

Variable Names Height vs Weight

Variable Names Height vs Weight

Name

Description

Height

in inches

Weight

in pounds

There are no rows in this table

⁠

⁠

Bonus for reaching the end of the last page!

(video posted on
Reddit⁠
by u/PR0CR45T184T0R)

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.