vampire.amath._pca_svd#

vampire.amath._pca_svd(A)[source]#

Principal component analysis of matrix A by singular value decomposition.

Returns loadings, principal components, and explained variance.

Parameters:
Andarray

Matrix with shape (m, n), where n features are in columns, and m measurements are in rows.

Returns:
Vndarray

Loadings, weights, principal directions, principal axes, eigenvector of covariance matrix of mean-subtracted A, with shape (n, n).

Tndarray

PC score, principal components, coordinates of mean-subtracted A in its principal directions, with shape (m, n).

dndarray

Explained variance, eigenvalues of covariance matrix of mean-subtracted A, with size n.

See also

numpy.linalg.svd

Notes

Suppose we have a matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) with \(n\) columns of features \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n\) and \(m\) rows of measurements:

\[\begin{split}\mathbf{A} = \begin{bmatrix} | & | & & | \\ \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_n \\ | & | & & | \\ \end{bmatrix}.\end{split}\]

We can perform principal component analysis (PCA) [1] on the matrix using singular value decomposition (SVD).

Mean subtraction

We first calculate the mean of the features \(\bar{x}_1, \bar{x}_2, \dots, \bar{x}_n\), respectively, and stored them in the matrix

\[\begin{split}\mathbf{\bar{A}} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \begin{bmatrix} \bar{x}_1 & \bar{x}_2 & \cdots & \bar{x}_n \end{bmatrix}.\end{split}\]

We then calculate the mean-subtracted data

\[\mathbf{B = A - \bar{A}}\]

to make the data zero mean.

Singular value decomposition

We compute the SVD of \(\mathbf{B}\):

\[\mathbf{B} = \mathbf{U \Sigma V}^T.\]

Multiply \(\mathbf{V}\) at the right on both sides, we get the principal components

\[\mathbf{T \equiv BV = U\Sigma},\]

where \(\mathbf{V}\) is the loading. The explained variance matrix \(\mathbf{D}\) is related to \(\mathbf{\Sigma}\) by

\[\mathbf{D} = \dfrac{1}{n-1}\mathbf{\Sigma}^2.\]

References

[1]

Brunton, S., & Kutz, J. (2019). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge: Cambridge University Press. doi:10.1017/9781108380690