Partial minimum area Difference expressed by components in sklearn

Question

Partial minimum area Difference expressed by components in sklearn

I am trying to do PLSRegression using code from sklearn and I want to keep those components that explain some level of variance in PCA for example.

Is there a way to find out how many variances are explained by each component in PLS

Thanks in extended

+4

scikit-learn pls

user2043236 12 Aug 14 at 14:10

source to share

1 answer

SpinoPi · Answer 1 · 2017-08-16T06:32:03+0000

I also have the same requirement for calculating the explanation of each component's explanation. I am new to pls, not native english, just accept my solution for reference.

Backgroud: If you choose "deflation_mode" as "regression" which is the default option. The estimated Y value can be calculated by this expression in "PLSRegression" [1]:

Y = TQ '+ Err

where T is x_scores_, Q is y_loadings_ This expression can provide a Y score from all major components. Therefore, if we want to know how many variances were explained using the first principal component, we could use the first vector x_scores_ and y_loadings_ to calculate the estimated Y1:

Y1 = T [0] Q [0] '+ Err

See below for Python code that calculates each component of R squared.

import numpy as np
from sklearn.cross_decomposition import PLSRegression
from sklearn.metrics import r2_score

pls = PLSRegression(n_components=3)
pls.fit(X,Y_true)
r2_sum = 0
for i in range(0,3):
        Y_pred=np.dot(pls.x_scores_[:,i].reshape(-1,1),pls.y_loadings_[:,i].reshape(-1,1).T)*naY.std(axis=0, ddof=1)+naY.mean(axis=0)
        r2_sum += round(r2_score(Y_true,Y_pred),3) 
        print('R2 for %d component: %g' %(i+1,round(r2_score(Y_true,Y_pred),3)))
print('R2 for all components (): %g' %r2_sum) #Sum of above
print('R2 for all components (): %g' %round(r2_score(Y_true,pls.predict(X)),3)) #Calcuted from PLSRegression 'predict' function.

Output:

R2 for 1 component: 0.633
R2 for 2 component: 0.221
R2 for 3 component: 0.104
R2 for all components: 0.958
R2 for all components: 0.958

[1] Pay attention to this expression. The jargon and meaning of "score", "weight" and "load" can be slightly different in different calculation methods.

Partial minimum area Difference expressed by components in sklearn

More articles: