Principal Components Analysis

Data with a lot of dimensions get really noisy.

What do you think would be a logical way of reducing the noise?

Actually, that would add more noise. For less noise, use fewer dimensions.

Of course!

How do you think the composite variables would ideally be correlated with each other?

Exactly. You want to get the correlated variables together so that you can end up with just uncorrelated composite variables.

No, they wouldn't. If that was the case, these could be further combined in order to reduce dimensions even more.

No, it would. Consider the point of PCA in the first place, and what it accomplishes.

Suppose for a moment that you had 10 dimensions to start with. How many correlations would you think would be used in PCA?

No, more than that. Consider that with two dimensions you would have just one correlation between them; with three dimensions, you already have three correlations.

Not quite. Consider that with two dimensions you would have just one correlation between them; with three dimensions, you already have three correlations. By the time you get to 7, you would have $$\frac{7(7-1)}{2} = 21$$ correlations.

Yes. It's $$\frac{n(n-1)}{2}$$ if you're curious. There are a lot of correlations to consider with that many dimensions, and so a covariance matrix is used. Then linear combinations of the features are made to define the composite variables, based on something called eigenvectors (*eigen* is German for *own*).

The eigenvalue shows the fraction of the total variance explained by each eigenvector. Based on this, which composite variable would be most important?

Absolutely. The higher the eigenvalue, the greater the explanatory power.

No, you'd want the one with the highest eigenvalue, since that is the one with the greatest explanatory power.

The next step is ordering the eigenvectors (composite variables) from highest to lowest eigenvalues. Then the algorithm takes the most valuable variables for their explanatory power. Most of the variation is then explained by just those top few principal components. The top variable gets the name "PC1."

In PCA, what similar distance would you expect to be of importance?

Precisely. This represents **projection error**. Similar to OLS, all projection errors are summed and minimized by the algorithm. Then on to PC2, the next best eigenvector, to explain what's left, then PC3 and so on until 85–95% of variation is explained, and this is often displayed on a visual tool called a scree plot to show how the variation is explained by the principal components. Then you're done. Well, until the computer is done. This is unsupervised learning.

No, these are different values in different scales.

No, the eigenvalues are descriptors of the eigenvalues themselves in context of the data.

To summarize: [[summary]]

How might you describe the eigenvectors as this process continues?

That's surprising. Most would agree that it's uninterpretable.

That's one downside of PCA. The composite variables you end up with through this process are fairly abstract; you can't really label them or even understand what they mean in their composite form. But at least they can be useful once their eigenvalues are calculated.

You're not alone!

Dimension reduction is a big part of unsupervised machine learning. **Principal components analysis (PCA)** is a prime method; you search for variables that are highly correlated and combine them into composite variables.

Finally, error. Recall ordinary least squares (OLS) regression, which looks for the minimized distances between the line and the data points.

Adding more dimensions

Reducing the dimensions

They would be uncorrelated

They would be highly correlated

Correlation wouldn't matter

The one with the largest eigenvalue

The one with the smallest eigenvalue

From data points to PC1

From data points to eigenvalues

From eigenvalues to eigenvectors

Continue

Intuitive

Uninterpretable

Simple

Principal Components Analysis

The quickest way to get your CFA® charter