Source code for vampire.amath

import numpy as np


[docs]def mean_center(A): r""" Mean center the matrix `A` by subtracting its mean. Parameters ---------- A : ndarray Matrix with columns of features and rows of measurements. Returns ------- B : ndarray Mean-centered matrix. Notes ----- Suppose we have a matrix :math:`\mathbf{A} \in \mathbb{R}^{m \times n}` with :math:`n` columns of features :math:`\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n` and :math:`m` rows of measurements: .. math:: \mathbf{A} = \begin{bmatrix} | & | & & | \\ \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_n \\ | & | & & | \\ \end{bmatrix}. The means of the features are :math:`\bar{x}_1, \bar{x}_2, \dots, \bar{x}_n`, respectively, and they are stored in the matrix .. math:: \mathbf{\bar{A}} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \begin{bmatrix} \bar{x}_1 & \bar{x}_2 & \cdots & \bar{x}_n \end{bmatrix}. The mean-centered (mean-subtracted) data is then .. math:: \mathbf{B = A - \bar{A}}. """ A_bar = np.mean(A, axis=0) B = A - A_bar return B
[docs]def pca(A, method=None): r""" Principal component analysis of matrix `A`. Returns loadings, principal components, and explained variance. Parameters ---------- A : ndarray Matrix with shape (m, n), where n features are in columns, and m measurements are in rows. method : None or str, optional Algorithm used to compute PCA: ``None`` If m >= n, use eigen-decomposition algorithm. If m < n, use singular value decomposition algorithm. ``'eig'`` Eigen-decomposition algorithm. ``'svd'`` Singular value decomposition algorithm. Returns ------- V : ndarray Loadings, weights, principal directions, principal axes, eigenvector of covariance matrix of mean-subtracted A, with shape (n, n). T : ndarray PC score, principal components, coordinates of mean-subtracted A in its principal directions, with shape (m, n). d : ndarray Explained variance, eigenvalues of covariance matrix of mean-subtracted A, with size n. See Also -------- _pca_eig : Implementation of eigen-decomposition algorithm. _pca_svd : Implementation of singular value decomposition algorithm. sklearn.decomposition.PCA : Packaged implementation. """ if method is None: m, n = A.shape if m >= n: return _pca_eig(A) else: return _pca_svd(A) elif method == 'eig': return _pca_eig(A) elif method == 'svd': return _pca_svd(A) else: raise ValueError(f'Unrecognized method {method}. \n' 'Expect method from one of {"svd", "eig"}')
[docs]def _pca_eig(A): r""" Principal component analysis of matrix `A` by eigen decomposition. Returns loadings, principal components, and explained variance. Parameters ---------- A : ndarray Matrix with shape (m, n), where n features are in columns, and m measurements are in rows. Returns ------- V : ndarray Loadings, weights, principal directions, principal axes, eigenvector of covariance matrix of mean-subtracted A, with shape (n, n). T : ndarray PC score, principal components, coordinates of mean-subtracted A in its principal directions, with shape (m, n). d : ndarray Explained variance, eigenvalues of covariance matrix of mean-subtracted A, with size n. See Also -------- numpy.linalg.eigh, numpy.linalg.eig Notes ----- Suppose we have a matrix :math:`\mathbf{A} \in \mathbb{R}^{m \times n}` with :math:`n` columns of features :math:`\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n` and :math:`m` rows of measurements: .. math:: \mathbf{A} = \begin{bmatrix} | & | & & | \\ \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_n \\ | & | & & | \\ \end{bmatrix}. We can perform principal component analysis (PCA) [1]_ on the matrix using eigen-decomposition. .. rubric:: **Mean subtraction** We first calculate the mean of the features :math:`\bar{x}_1, \bar{x}_2, \dots, \bar{x}_n`, respectively, and stored them in the matrix .. math:: \mathbf{\bar{A}} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \begin{bmatrix} \bar{x}_1 & \bar{x}_2 & \cdots & \bar{x}_n \end{bmatrix}. We then calculate the mean-subtracted data .. math:: \mathbf{B = A - \bar{A}} to make the data zero mean. .. rubric:: **Covariance matrix** The covariance matrix :math:`\mathbf{C}` of the rows of :math:`\mathbf{B}` is .. math:: \mathbf{C} = \dfrac{1}{n-1} \mathbf{B}^T \mathbf{B}. The eigenvalue decomposition of the symmetric matrix :math:`\mathbf{C}` gives .. math:: \mathbf{C} = \mathbf{V}\mathbf{D}\mathbf{V}^{-1}, where :math:`\mathbf{V}` is an orthogonal matrix containing the eigenvectors, and :math:`\mathbf{D}` is a diagonal matrix containing the eigenvalues. .. rubric:: **Principal components** The principal components :math:`\mathbf{T}` is defined as .. math:: \mathbf{T} \equiv \mathbf{BV}, where :math:`\mathbf{V}` is called the loadings. References ---------- .. [1] Brunton, S., & Kutz, J. (2019). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge: Cambridge University Press. doi:10.1017/9781108380690 """ # A_bar = np.mean(A, axis=0) # B = A - A_bar B = mean_center(A) C = B.T @ B / (B.shape[0] - 1) d, V = np.linalg.eigh(C) # d is diagonal entries of D # sort the eigenvalues and eigenvectors in descending order # the convention of `eigh()` gives them in ascending order sorting_index = np.arange(len(d))[::-1] d = d[sorting_index] V = V[:, sorting_index] T = B @ V return V, T, d
[docs]def _pca_svd(A): r""" Principal component analysis of matrix `A` by singular value decomposition. Returns loadings, principal components, and explained variance. Parameters ---------- A : ndarray Matrix with shape (m, n), where n features are in columns, and m measurements are in rows. Returns ------- V : ndarray Loadings, weights, principal directions, principal axes, eigenvector of covariance matrix of mean-subtracted A, with shape (n, n). T : ndarray PC score, principal components, coordinates of mean-subtracted A in its principal directions, with shape (m, n). d : ndarray Explained variance, eigenvalues of covariance matrix of mean-subtracted A, with size n. See Also -------- numpy.linalg.svd Notes ----- Suppose we have a matrix :math:`\mathbf{A} \in \mathbb{R}^{m \times n}` with :math:`n` columns of features :math:`\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n` and :math:`m` rows of measurements: .. math:: \mathbf{A} = \begin{bmatrix} | & | & & | \\ \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_n \\ | & | & & | \\ \end{bmatrix}. We can perform principal component analysis (PCA) [1]_ on the matrix using singular value decomposition (SVD). .. rubric:: **Mean subtraction** We first calculate the mean of the features :math:`\bar{x}_1, \bar{x}_2, \dots, \bar{x}_n`, respectively, and stored them in the matrix .. math:: \mathbf{\bar{A}} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \begin{bmatrix} \bar{x}_1 & \bar{x}_2 & \cdots & \bar{x}_n \end{bmatrix}. We then calculate the mean-subtracted data .. math:: \mathbf{B = A - \bar{A}} to make the data zero mean. .. rubric:: **Singular value decomposition** We compute the SVD of :math:`\mathbf{B}`: .. math:: \mathbf{B} = \mathbf{U \Sigma V}^T. Multiply :math:`\mathbf{V}` at the right on both sides, we get the principal components .. math:: \mathbf{T \equiv BV = U\Sigma}, where :math:`\mathbf{V}` is the loading. The explained variance matrix :math:`\mathbf{D}` is related to :math:`\mathbf{\Sigma}` by .. math:: \mathbf{D} = \dfrac{1}{n-1}\mathbf{\Sigma}^2. References ---------- .. [1] Brunton, S., & Kutz, J. (2019). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge: Cambridge University Press. doi:10.1017/9781108380690 """ n = len(A) # A_bar = np.mean(A, axis=0) # B = A - A_bar B = mean_center(A) U, s, VT = np.linalg.svd(B, full_matrices=False) V = VT.T T = U @ np.diag(s) d = s ** 2 / (n - 1) return V, T, d
[docs]def get_rotation_matrix(A, B): r""" Returns optimal rotation matrix to align 2D coordinates `A` to `B`. Parameters ---------- A : ndarray Matrix to be rotated, with shape (2, n). B : ndarray Matrix to be aligned to, with shape (2, n). Returns ------- R : ndarray Optimal rotation matrix to be applied to `A`, with shape (2, 2). See Also -------- scipy.spatial.transform.Rotation.align_vectors : Aligns 3D coordinates. Notes ----- We want to align 2D coordinates of object A to that of object B, represented by the matrices .. math:: \mathbf{A} = \begin{bmatrix} — & \mathbf{x}_A & — \\ — & \mathbf{y}_A & — \\ \end{bmatrix}, \mathbf{B} = \begin{bmatrix} — & \mathbf{x}_B & — \\ — & \mathbf{y}_B & — \\ \end{bmatrix}, respectively, where :math:`\mathbf{x}_A, \mathbf{x}_B` are the x-coordinates, and :math:`\mathbf{y}_A, \mathbf{y}_B` are the y-coordinates. It is equivalent to finding the optimal rotation matrix :math:`\mathbf{R}` such that :math:`\mathbf{A}` after rotation has the minimum sum of squared distance loss :math:`L(\mathbf{R})` with :math:`\mathbf{B}`: .. math:: L(\mathbf{R}) = \Vert \mathbf{RA} - \mathbf{B} \Vert_2^2. .. rubric:: **Kabsch algorithm** The optimal rotation matrix can be found using the Kabsch algorithm [1]_, [2]_, [3]_: 1. Compute the covariance matrix .. math:: \mathbf{C = AB}^T 2. Compute the SVD of the covariance matrix .. math:: \mathbf{C = U\Sigma V}^T 3. The optimal rotation matrix is .. math:: \mathbf{R = VU}^T References ---------- .. [1] Lydia E. Kavraki, Molecular Distance Measures. OpenStax CNX. (2007) http://cnx.org/contents/1d5f91b1-dc0b-44ff-8b4d-8809313588f2@23 * The section "Optimal Alignment for lRMSD Using Rotation Matrices" describes and proves the Kabsch algorithm. .. [2] Dryden, I. L. & Mardia, K. V. Statistical Shape Analysis, with Applications in R 2nd edn. https://doi.org/10.1002/9781119072492 (2016). * Section 4.1.1 "Procrustes distances" describes and proves the Kabsch algorithm with Lemma 4.1. .. [3] Wu, PH., Phillip, J., Khatau, S. et al. Evolution of cellular morpho-phenotypes in cancer metastasis. Sci Rep 5, 18437 (2016). https://doi.org/10.1038/srep18437 * Supplemental information section "Decomposition of 2-dimensional shape and identification of shape mode" describes the implementation of the Kabsch algorithm in the context of VAMPIRE methodology. """ C = A @ B.T U, s, VT = np.linalg.svd(C) V = VT.T R = V @ U.T return R