vampire.amath._pca_eig#

vampire.amath._pca_eig(A)[source]#

Principal component analysis of matrix A by eigen decomposition.

Returns loadings, principal components, and explained variance.

Parameters:
Andarray

Matrix with shape (m, n), where n features are in columns, and m measurements are in rows.

Returns:
Vndarray

Loadings, weights, principal directions, principal axes, eigenvector of covariance matrix of mean-subtracted A, with shape (n, n).

Tndarray

PC score, principal components, coordinates of mean-subtracted A in its principal directions, with shape (m, n).

dndarray

Explained variance, eigenvalues of covariance matrix of mean-subtracted A, with size n.

Notes

Suppose we have a matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) with \(n\) columns of features \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n\) and \(m\) rows of measurements:

\[\begin{split}\mathbf{A} = \begin{bmatrix} | & | & & | \\ \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_n \\ | & | & & | \\ \end{bmatrix}.\end{split}\]

We can perform principal component analysis (PCA) [1] on the matrix using eigen-decomposition.

Mean subtraction

We first calculate the mean of the features \(\bar{x}_1, \bar{x}_2, \dots, \bar{x}_n\), respectively, and stored them in the matrix

\[\begin{split}\mathbf{\bar{A}} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \begin{bmatrix} \bar{x}_1 & \bar{x}_2 & \cdots & \bar{x}_n \end{bmatrix}.\end{split}\]

We then calculate the mean-subtracted data

\[\mathbf{B = A - \bar{A}}\]

to make the data zero mean.

Covariance matrix

The covariance matrix \(\mathbf{C}\) of the rows of \(\mathbf{B}\) is

\[\mathbf{C} = \dfrac{1}{n-1} \mathbf{B}^T \mathbf{B}.\]

The eigenvalue decomposition of the symmetric matrix \(\mathbf{C}\) gives

\[\mathbf{C} = \mathbf{V}\mathbf{D}\mathbf{V}^{-1},\]

where \(\mathbf{V}\) is an orthogonal matrix containing the eigenvectors, and \(\mathbf{D}\) is a diagonal matrix containing the eigenvalues.

Principal components

The principal components \(\mathbf{T}\) is defined as

\[\mathbf{T} \equiv \mathbf{BV},\]

where \(\mathbf{V}\) is called the loadings.

References

[1]

Brunton, S., & Kutz, J. (2019). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge: Cambridge University Press. doi:10.1017/9781108380690