Save specific component to PCA
I have a numpy array called "data" that has 500 rows and 500 columns. Using the PCA from sklearn I can compress this down to 500 rows and 15 columns. I reckon that I essentially go from 500 axes and 500 points to 15 axes and 500 points. The axes are all orthogonal and explain my data very well.
But I want to know if there is anyway to guarantee that one of the 15 axes (which I get after starting the PCA) is also one of the original 500. That is, can I keep one of the original axes and use the PCA (or some or another method) to find the remaining 14?
My code is below:
from sklearn.decomposition import PCA
#data is some 500x500 numpy array
pca = PCA(n_components = 15)
pca_result = pca.fit_transform(data)
#pca_result is a 500x15 numpy array
source to share
You can simply omit the axis you want to keep from the data:
mask = np.ones(data.shape[1], dtype=np.bool)
mask[special_axis] = False
data_new = data[:, mask]
pca_transformed = PCA(n_components=14).fit_transform(data_new)
This is the same as deleting the projection along this function. Then you can add the original axis with the PCA result if you like:
stacked_result = np.hstack([pca_transformed, data[:, [special_axis]]])
source to share
I think you are trying to do the least squares linear snapping first to the axis you want to keep:
axis_to_keep = data[:,column_number][:,np.newaxis]
# next line solves axis_to_keep*x = data
x = np.linalg.lstsq(axis_to_keep,data)[0]
Then subtract the fit generated using that model from data
:
data_2 = data - np.dot(axis_to_keep,x)
at this point you can make your PCA data_2
with 14 components. Your forced axis will (almost certainly) not be orthogonal to the others.
source to share