Principal Component Analysis(PCA)

Introduction

  • Goal

To apply dimension deduction on input data set. e.g. to project original input data matrix M(n*m) into a lower dimension matrix N(n*p) where p is a number smaller than m. Why we need this? Cause some data features are not such usable in current learning problem and it will increase the calculate complexity if we add them into our learning process. So in this way, we could deduct original dimension into a smaller without losing import feature spaces.

  • Intuition
    • PCA(Maximum Projection Variance): prefer to find the feature with bigger projection variance
    • PCA(Minimum Projection Cost): prefer to find the feature with less residue projection cost
    • Independent Component Analysis(ICA): prefer to find the feature which are statistically independent

PCA(Maximum Projection Variance)

Given data matrix X with n data points of dimension p, find $W \in R^p$ such that:
$$
\Vert W\Vert = 1, W~maximize~Var(WX)
$$

$$
Var(WX) = \frac{1}{N}(W^{\top}(X-\bar{X})^{\top}(X-\bar{X})W) = W^{\top}SW
$$

where S is the sample variance matrix of X:
$$
S = \frac{1}{N}\sum\limits_n(X-\bar{X})^{\top}(X-\bar{X})
$$
The first principal component $w_1$ is simply the eigenvector of $S$ corresponding to its largest eigenvalue. The kth principal component $w_k$ is the eigenvector of $S$ with respect to $k$th largest eigenvalue.