I've to make an Adjusted R-squared callable function, using make_scorer function of sklearn.metrics. The Adjusted R-squared has a parameter, the number of features, which I am finding hard to code. The reason why I am making this scorer is that I want to use it in mlxtend.feature_selection.SequentialFeatureSelector.
import numpy as np
from sklearn.metrics import make_scorer
# x = y_true
# y = y_pred
# major_X = feature dataset
def Adj_r2(x, y, major_X):
zx = (x-np.mean(x))/np.std(x, ddof=1)
zy = (y-np.mean(y))/np.std(y, ddof=1)
r = np.sum(zx*zy)/(len(x)-1)
r2 = pow(r, 2)
# major_X.shape[1] gives to number of features that were used to make the prediction
return 1 - ((1-r2) * (len(x)-1)/(len(x)-major_X.shape[1]-1))
# Using make_scorer, to put it later in Sequential Feature Selector
scorer_adj_r2 = make_scorer(Adj_r2, greater_is_better=True)
You could define a custom scorer with signature
func(estimator, X, y)as suggested in this answer. In your case the custom scorer definition would be:which is equivalent to statsmodel's adjusted R-squared definition:
You can then use the custom scorer in
mlxtend'sSequentialFeatureSelectoras follows: