vampire.amath._pca_eig#
- vampire.amath._pca_eig(A)[source]#
Principal component analysis of matrix A by eigen decomposition.
Returns loadings, principal components, and explained variance.
- Parameters:
- Andarray
Matrix with shape (m, n), where n features are in columns, and m measurements are in rows.
- Returns:
- Vndarray
Loadings, weights, principal directions, principal axes, eigenvector of covariance matrix of mean-subtracted A, with shape (n, n).
- Tndarray
PC score, principal components, coordinates of mean-subtracted A in its principal directions, with shape (m, n).
- dndarray
Explained variance, eigenvalues of covariance matrix of mean-subtracted A, with size n.
See also
Notes
Suppose we have a matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) with \(n\) columns of features \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n\) and \(m\) rows of measurements:
\[\begin{split}\mathbf{A} = \begin{bmatrix} | & | & & | \\ \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_n \\ | & | & & | \\ \end{bmatrix}.\end{split}\]We can perform principal component analysis (PCA) [1] on the matrix using eigen-decomposition.
Mean subtraction
We first calculate the mean of the features \(\bar{x}_1, \bar{x}_2, \dots, \bar{x}_n\), respectively, and stored them in the matrix
\[\begin{split}\mathbf{\bar{A}} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \begin{bmatrix} \bar{x}_1 & \bar{x}_2 & \cdots & \bar{x}_n \end{bmatrix}.\end{split}\]We then calculate the mean-subtracted data
\[\mathbf{B = A - \bar{A}}\]to make the data zero mean.
Covariance matrix
The covariance matrix \(\mathbf{C}\) of the rows of \(\mathbf{B}\) is
\[\mathbf{C} = \dfrac{1}{n-1} \mathbf{B}^T \mathbf{B}.\]The eigenvalue decomposition of the symmetric matrix \(\mathbf{C}\) gives
\[\mathbf{C} = \mathbf{V}\mathbf{D}\mathbf{V}^{-1},\]where \(\mathbf{V}\) is an orthogonal matrix containing the eigenvectors, and \(\mathbf{D}\) is a diagonal matrix containing the eigenvalues.
Principal components
The principal components \(\mathbf{T}\) is defined as
\[\mathbf{T} \equiv \mathbf{BV},\]where \(\mathbf{V}\) is called the loadings.
References
[1]Brunton, S., & Kutz, J. (2019). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge: Cambridge University Press. doi:10.1017/9781108380690