Partial Least Squares Variance Explained by components in sklearn

4.3k Views Asked by At

I am trying to perform a PLSRegression using the code from sklearn and I want to keep with those components that explain some level of variance, like in PCA.

Is there a way to know how much variance is explained by each component in PLS

2

There are 2 best solutions below

0
On

Following up on the answer by @SpinoPi, I wrote a function to compute and plot the variance explained by each PLS component.

import matplotlib.pyplot as plt
import numpy as np
from sklearn.cross_decomposition import PLSRegression
from sklearn.metrics import r2_score

def pls_explained_variance(pls, X, Y_true, do_plot=True):
    r2 = np.zeros(pls.n_components)
    x_transformed = pls.transform(X) # Project X into low dimensional basis
    for i in range(0, pls.n_components):
        Y_pred = (np.dot(x_transformed[:, i][:, np.newaxis],
                         pls.y_loadings_[:, i][:, np.newaxis].T) * pls._y_std   
                  + pls._y_mean)
        r2[i] = r2_score(Y_true, Y_pred)
        overall_r2 = r2_score(Y_true, pls.predict(X))  # Use all components together.

    if do_plot:
        component = np.arange(pls.n_components) + 1
        plt.plot(component, r2, '.-')
        plt.xticks(component)
        plt.xlabel('PLS Component #'), plt.ylabel('r2')
        plt.title(f'Summed individual r2: {np.sum(r2):.3f}, '
                  f'Overall r2: {overall_r2:.3f}')
        plt.show()

    return r2, overall_r2

# Example usage.
pls = PLSRegression(n_components=2).fit(X_train, Y_train)
pls_explained_variance(pls, X_test, Y_test)

r2 by PLS component

2
On

I also have the same requirement of calculating each components' explained variance. I am new for PLS and not a native English speaker, please just take my solution for your reference.

Backgroud: If you choose the 'deflation_mode' as "regression", which is the default option. The estimated Y could be calculated by this expression in "PLSRegression"[1]:

Y = TQ' + Err

where T is the x_scores_, Q is the y_loadings_ This expression could provide the estimated Y from all of principle components. So if we want to know how many variance has been explained of the first principle component, we could use the fist vector of the x_scores_ and y_loadings_ to calculate estimated Y1:

Y1 = T[0]Q[0]' + Err

Please see the code in Python below, which calculates each component's R square.

import numpy as np
from sklearn.cross_decomposition import PLSRegression
from sklearn.metrics import r2_score

pls = PLSRegression(n_components=3)
pls.fit(X,Y_true)
r2_sum = 0
for i in range(0,3):
        Y_pred=np.dot(pls.x_scores_[:,i].reshape(-1,1),pls.y_loadings_[:,i].reshape(-1,1).T)*naY.std(axis=0, ddof=1)+naY.mean(axis=0)
        r2_sum += round(r2_score(Y_true,Y_pred),3) 
        print('R2 for %d component: %g' %(i+1,round(r2_score(Y_true,Y_pred),3)))
print('R2 for all components (): %g' %r2_sum) #Sum of above
print('R2 for all components (): %g' %round(r2_score(Y_true,pls.predict(X)),3)) #Calcuted from PLSRegression's 'predict' function.

Output:

R2 for 1 component: 0.633
R2 for 2 component: 0.221
R2 for 3 component: 0.104
R2 for all components: 0.958
R2 for all components: 0.958

[1] Please be aware of this expression. The jargon and value of 'score', 'weight' and 'loading' might be a little different in different calculation method.