Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms a dataset into a lower-dimensional space while retaining as much variance as possible.

Key Pointers

What is linear combination?

A linear combination simply means taking your existing columns of data, multiplying each one by a specific weight (a number), and adding them all together to create a brand new column.
That brand new column is your Principal Component.

What Does "Transforms a Dataset into a Lower-Dimensional Space" Mean?

When we say that PCA transforms a dataset into a lower-dimensional space, it means that we reduce the number of features while preserving as much important information as possible. Instead of selecting or eliminating individual features, PCA creates new features (Principal Components) that are combinations of the original features.


The Scenario: Student Test Scores

Below scenario illustrates, the meaning of Principal Component and linear combination

Imagine you have a dataset of high school students with three original features (variables):

The First Principal Component (PC1)

When you run PCA, the algorithm looks for the largest pattern (the maximum variance) in the data. It realizes that Math and Physics move together, so it creates the First Principal Component by assigning heavy weights to those two subjects and a near-zero weight to Literature.

The mathematical linear combination for PC1 might look like this:

PC1=(0.70×X1)+(0.70×X2)+(0.05×X3)

What this means:

The Second Principal Component (PC2)

PCA then looks for the next biggest pattern that is entirely unrelated (orthogonal) to the first one. It creates a second linear combination.

PC2=(0.10×X1)+(0.10×X2)+(0.95×X3)

What this means:


Step-by-Step Computation:

  1. Standardization:
    PCA is sensitive to the scale of data, so we must subtract the mean of each column from every value so that the new mean is zero. (μ=0,σ=1 )
    Set up the standardized data in a matrix, with each row being an object and the columns are the parameter values - there can be no missing data.
  2. Covariance Matrix:
    Calculate relationships between all feature pairs from the data matrix.
  3. Eigen-Decomposition:
    Compute the eigenvalues (variance magnitude) and eigenvectors (PC directions) of the covariance matrix
  4. Sort & Select
    Sort the eigenvectors in descending order of their corresponding eigenvalues.
    Select the top k eigenvectors that correspond to the largest eigenvalues, where k is the desired number of principal components. (Do this step only if you need to reduce dimensionality, as it will eliminate information from the data.)
  5. Project:
    Project the data onto the k selected eigenvectors to obtain the reduced dimensional representation

How to Choose k?


The Scenario: Wine quality

Imagine we have two features for 3 bottles of wine:

  1. Alcohol Content (X1)
  2. Color Intensity (X2)
Bottle X1​ (Alcohol) X2​ (Color)
A 10 2
B 20 8
C 30 5
Step 1: Standardize the Data (Mean Centering)
Bottle X1​ (Alcohol) X2​ (Color)
A 10-20=-10 2-5=-3
B 20-20=0 8-5=3
C 30-20=10 5-5=0
Step 2: Calculate the Covariance Matrix

We want to see how X1 and X2 vary together. We use the formula

Cov(X,Y)=(xix¯)(yiy¯)n1 [10015159]
Step 3.a: Calculate Eigenvalues (λ)

We solve the characteristic equation: det(ΣλI)=0.

det[100λ15159λ]=0(100λ)(9λ)(15×15)=0λ2109λ+675=0

Using the quadratic formula, we find:

Step 3.b: Calculate Eigenvectors (v)

We plug λ1=102.4 back into (ΣλI)v=0 to find the direction.

(100102.4)x+15y=02.4x+15y=0y=0.16x

After normalizing (so the vector length is 1), our Eigenvector 1 is approximately:

v1=[0.987,0.158]

Interpretation: To make PC1, we take 98.7% of Alcohol and 15.8% of Color.

Step 4: Project the Data onto the New PC

Now we transform our original centered points into their new 1D "PC score" using the dot product: PC1=(Xcenteredv1).

Final Result

We have successfully reduced our 2-column dataset into a single column (PC1):

Bottle Original Features (2D) PCA Feature (1D)
A (10, 2) -10.34
B (20, 8) 0.47
C (30, 5) 9.87

Interpretation: These three numbers now represent the "essence" of the wine. You can now use this single column in a machine learning model, knowing it contains 92% of the information that used to require two columns.


Advantages:

Limitations: