Sklearn PCA not working

I am playing with sclearn PCA and it is behaving strangely.

from sklearn.decomposition import PCA
import numpy as np
identity = np.identity(10)
pca = PCA(n_components=10)
augmented_identity = pca.fit_transform(identity)
np.linalg.norm(identity - augmented_identity)

4.5997749080745738

      

Note that I am setting the number of measurements to 10. Shouldn't the rate be 0?

Any insight as to why this is not the case would be appreciated.

+3


source to share


1 answer


Although the PCA calculates the orthogonal components based on the covariance matrix, the input to the PCA in sklearn is a data matrix instead of a covariance / correlation matrix.



import numpy as np
from sklearn.decomposition import PCA

# gaussian random variable, 10-dimension, identity cov mat
X = np.random.randn(100000, 10)



pca = PCA(n_components=10)
X_transformed = pca.fit_transform(X)

np.linalg.norm(np.cov(X.T) - np.cov(X_transformed.T))

Out[219]: 0.044691263454134933

      

+4


source







All Articles