Principal Component Analysis (PCA)

NOTE

This is part of the 10th homework, titled Dimensionality Reduction 1, for the course Machine Learning (IN2064) in the Winter Semester 2024/25 at TUM.

Problem 3

Let the matrix $X \in R^{N \times D}$ represent $N$ data points of dimension $D = 10$ (samples stored as rows). We applied PCA to $X$ . By using the $K = 5$ top principal components, we transformed/projected $X$ into $\tilde{X} \in R^{N \times K}$ . We computed that $\tilde{X}$ preserves 70% of the variance of the original data $X$ .

Suppose now we apply PCA on the following matrices:

a) $Y_{1} = XS$ where $S = λ I$ , with $λ \in R$ and $I \in R^{D \times D}$ is the identity matrix
b) $Y_{2} = XR$ where $R \in R^{D \times D}$ and $R R^{T} = I$
c) $Y_{3} = XP$ where $P = diag (+ 5, - 5, \dots, + 5, - 5)$ is a $D \times D$ diagonal matrix
d) $Y_{4} = XQ$ where $Q = diag (1, 2, 3, \dots, D - 1, D)$ is a $D \times D$ diagonal matrix
e) $Y_{5} = X + 1_{N} μ^{T}$ where $μ \in R^{D}$ and $1_{N}$ is an $N$ -dimensional column vector of all ones
f) $Y_{6} = XA$ where $A \in R^{D \times D}$ and $rank (A) = 5$

and obtain the projected data $\tilde{Y}_{1}, \dots, \tilde{Y}_{6} \in R^{N \times K}$ using the principal components corresponding to the top $K = 5$ largest eigenvalues of the respective $Y_{i}$ . What fraction of variance of each $Y_{i}$ will be preserved by each respective $\tilde{Y}_{i}$ ? Justify your answer. The answer “cannot tell without additional information” is also valid if you provide a justification.

Solution:

a) $Y_{1} = XS$ where $S = λ I$ , with $λ \in R$ and $I \in R^{D \times D}$ is the identity matrix

The goal of the PCA is to decompose the data $X$ into $ΓΛ Γ^{⊤}$ where $Γ$ is the matrix of eigenvectors and $Λ$ is diagonal matrix of the corresponding eigenvalues ordered in a decreasing order after taking their absolute value. As such PCA decomposition of $Y_{1}$ would go as follows:

Y_{1} = X λ I = ΓΛ Γ^{⊤} λ I = Γ λ Λ Γ^{⊤}

Now, when we do dimensionality reduction by leaving zeroing all but first $K$ columns of $Λ$ . Recall, that the eigenvalues correspond to the variance on a given dimension. Given that the sum of top $K$ eigenvalues of $X$ was equal to the 70% of the all eigenvalues, then after multiplication by a constant $λ$ this property will still hold. As, such we can conclude that $\tilde{Y_{1}}$ will preserve also 70% of the variance.

b) $Y_{2} = XR$ where $R \in R^{D \times D}$ and $R R^{T} = I$

The description of the matrix $R$ tells as that is an orthogonal matrix. We know that, orthogonal linear maps preserve the inner product of vectors. So they do not change the eigenvalues, as no scaling happens. Therefore, again $\tilde{Y_{2}}$ will preserve the same proportion of variance as does $\tilde{X}$ which in the case of this question is 70%.

c) $Y_{3} = XP$ where $P = diag (+ 5, - 5, \dots, + 5, - 5)$ is a $D \times D$ diagonal matrix

The linear map $P$ scales every dimension of the input space by five and inverts the evenly indexed ones. This has no impact on the magnitude of eigenvalues other than the one discussed in the point a) and as such the same proportion of variance is preserved as in the case of $\tilde{X}$ .

d) $Y_{4} = XQ$ where $Q = diag (1, 2, 3, \dots, D - 1, D)$ is a $D \times D$ diagonal matrix

Here linear map $Q$ scales each dimension by a different factor, this in turn scales eigenvalues by different factors too and thus we cannot tell what proportion of variance will be ultimately preserved.

e) $Y_{5} = X + 1_{N} μ^{T}$ where $μ \in R^{D}$ and $1_{N}$ is an $N$ -dimensional column vector of all ones

The addition of $1_{N} μ^{T}$ doesn’t impact the PCA decomposition as the first step of the PCA method is to center the data.

f) $Y_{6} = XA$ where $A \in R^{D \times D}$ and $rank (A) = 5$

Here, we are multiplying by a matrix A which is of rank 5, this results in a rank of $Y_{6}$ being at most 5. Thus it will have at most 5 eigenvectors and therefore if we decrease dimensionality of the data to 5 then we can’t lose variance, i.e. we are preservering 100% of it.

Problem 4

You are given $N = 4$ data points: ${x_{i}}_{i = 1}^{4}, x_{i} \in R^{3}$ , represented with the matrix $X \in R^{4 \times 3}$ .

X = 424 - 2 31 - 1 1 2 - 2 22

Hint: In this task the results of all (final and intermediate) computations happen to be integers.

a) Perform principal component analysis (PCA) of the data $X$ , i.e. find the principal components and their associated variances in the transformed coordinate system. Show your work.
b) Project the data to two dimensions, i.e. write down the transformed data matrix $Y \in R^{4 \times 2}$ using the top-2 principal components you computed in (a). What fraction of variance of $X$ is preserved by $Y$ ?
c) Let $x_{5} \in R^{3}$ be a new data point. Specify the vector $x_{5}$ such that performing PCA on the data including the new data point ${x_{i}}_{i = 1}^{5}$ leads to exactly the same principal components as in (a).

Solution

a) Perform principal component analysis (PCA) of the data $X$ , i.e. find the principal components and their associated variances in the transformed coordinate system. Show your work.

Step 1: Center the data: Each row is in the matrix $X$ is a data-point. As such, we center the data by subtracting an average data-point from row: $\hat{X} = 424 - 2 31 - 1 1 2 - 2 22 - 1111 [211] = 202 - 4 20 - 2 0 1 - 3 11$

Step 2: Compute covariance matrix: Recall covariance between two random variables is given by:

co v_{x, y} = \frac{\sum ( x _{i} - x ^ ) ( y _{j} - y ^ )}{N - 1}

Then a covariance matrix is given as follows:

Σ_{X} = Var (X_{1}) Cov (X_{2}, X_{1}) ⋮ Cov (X_{d}, X_{1}) Cov (X_{1}, X_{2}) Var (X_{2}) ⋮ \dots \dots \dots ⋱ \dots Cov (X_{1}, X_{d}) ⋮ ⋮ Var (X_{d}) = \frac{1}{N} X^{T} X - \hat{X} \hat{X}^{T}

We will use the right most part of the equality, especially since we can disregard $\hat{X} \hat{X}^{T}$ having centered the data in the step one. Thus,

Σ_{\hat{X}} = \frac{1}{N} X^{T} X = = \frac{1}{4} 221 00 - 3 2 - 2 1 - 4 01 202 - 4 20 - 2 0 1 - 3 11 = = \frac{1}{4} 4 + 0 + 4 + 16 4 + 0 - 4 + 0 2 + 0 + 2 - 4 4 + 0 - 4 + 0 4 + 0 + 4 + 0 2 + 0 - 3 + 0 2 + 0 + 2 - 4 2 + 0 - 2 + 0 1 + 9 + 1 + 1 = = \frac{1}{4} 24000800012 = = 600020003

Step 3: find the eigenvector as and perform the decomposition. Normally we would use the power Iteration a.k.a Von Mises iteration, however since our covariance matrix is already diagonal we get the eigenvectors and their eigenvalues for free! The eigenvalues:

λ_{1} = 6, λ_{2} = 2, λ_{3} = 3

and their corresponding eigenvectors:

v_{1} = 100, v_{2} = 010, v_{3} = 001

One thing to remember when decomposing $X$ into $c$ is that the eigenvalues in the $Λ$ matrix have to ordered in decreasing oder with respect to their absolute values. So the final decomposition is given by:

Σ_{\hat{X}} = ΓΛ Γ^{T} = 100001010600030002100001010

b) Project the data to two dimensions, i.e. write down the transformed data matrix $Y \in R^{4 \times 2}$ using the top-2 principal components you computed in (a). What fraction of variance of $X$ is preserved by $Y$ ? The projection comes down to multiplication of the centered data with the first two columns of the of the $Γ$ matrix:

Y = \hat{X} Γ = 202 - 4 20 - 2 0 1 - 3 11 100001 = 202 - 4 1 - 3 11

Now, the $Λ$ matrix is a covariance matrix of the new space. We can use this fact with to calculate the percentage of preserved variance:

Total variance Preserved variance Fraction of preserved variance = λ_{1} + λ_{2} + λ_{3} = 11 = λ_{1} + λ_{3} = 9 = \frac{9}{11} \approx 82%

c) Let $x_{5} \in R^{3}$ be a new data point. Specify the vector $x_{5}$ such that performing PCA on the data including the new data point ${x_{i}}_{i = 1}^{5}$ leads to exactly the same principal components as in (a).

The idea here is to choose a vector such that it doesn’t change the eigenvalues of the covariance matrix, i.e. at most it scales the values of the covariance matrix. We can do that by choosing the vector that is the mean of the all the pre-existing vectors. In our case that would be:

x_{5} = [211]

Then after centering it will be zeroed-out, and thus it will not impact covariance matrix other than rescaling its values as we have increased number of data-points by one so the preceding scaling factor of the covariance matrix is no longer $\frac{1}{4}$ but $\frac{1}{5}$ .

aderylo: field-report

Explorer

Principal Component Analysis (PCA)

Problem 3

Solution:

Problem 4

Solution

Graph View

Table of Contents

Backlinks