VotingRegressor with MultiOutputRegressor (Python, SKLearn)

43 Views Asked by At

I have a material property dataset where I use my features, X, to predict two (2) mechanical properties, y (a dataframe with two columns). I cannot share the data for proprietary reasons. I have used voting model before when y only has one dimension.

For this problem, I have successfully used GridSearchCV to determine the multi-label best estimators of various regression models (e.g. SVR, KNN). Now I would like to use the best estimators in a soft voting model, however, I keep receiving either ValueError: y must have at least two dimensions for multi-output regression but has only one or ValueError: y should be a 1d array, got an array of shape (637, 2) instead - With THE SAME array/dataframe for y.

I am unsure if I am placing the MultiOutputRegressor in the right place with regard to the Voting model:

cv = MultiOutputRegressor(VotingRegressor(estimators = best_classifiers, verbose=verbosity, n_jobs=-1))
cv.fit(X_train,y_train)

ValueError: y must have at least two dimensions for multi-output regression but has only one.

I then tried this:

print(y_train.to_numpy().reshape(-1,2)) #this shows an array with 2 columns
cv = VotingRegressor(estimators = best_classifiers, verbose=verbosity, n_jobs=-1)
wrapper = MultiOutputRegressor(cv)
wrapper.fit(X_train,y_train.to_numpy().reshape(-1,2))

ValueError: y must have at least two dimensions for multi-output regression but has only one.

I also tried it with out the the MOR wrapper:

cv = VotingRegressor(estimators = best_classifiers, verbose=verbosity, n_jobs=-1)
cv.fit(X_train,y_train)

ValueError: y should be a 1d array, got an array of shape (637, 2) instead.

As background, the best_classifiers (simplified) prints as:

[('knn', Pipeline(steps=[('imputation', SimpleImputer()), ('scaler', RobustScaler()),
                ('knn',
                 MultiOutputRegressor(estimator=KNeighborsRegressor(n_neighbors=10,
                                                                    weights='distance')))])), ('svm', Pipeline(steps=[('imputation', SimpleImputer()), ('scaler', RobustScaler()),
                ('svm', MultiOutputRegressor(estimator=SVR(C=100.0)))]))]
0

There are 0 best solutions below