I have a material property dataset where I use my features, X, to predict two (2) mechanical properties, y (a dataframe with two columns). I cannot share the data for proprietary reasons. I have used voting model before when y only has one dimension.
For this problem, I have successfully used GridSearchCV to determine the multi-label best estimators of various regression models (e.g. SVR, KNN). Now I would like to use the best estimators in a soft voting model, however, I keep receiving either ValueError: y must have at least two dimensions for multi-output regression but has only one or ValueError: y should be a 1d array, got an array of shape (637, 2) instead - With THE SAME array/dataframe for y.
I am unsure if I am placing the MultiOutputRegressor in the right place with regard to the Voting model:
cv = MultiOutputRegressor(VotingRegressor(estimators = best_classifiers, verbose=verbosity, n_jobs=-1))
cv.fit(X_train,y_train)
ValueError: y must have at least two dimensions for multi-output regression but has only one.
I then tried this:
print(y_train.to_numpy().reshape(-1,2)) #this shows an array with 2 columns
cv = VotingRegressor(estimators = best_classifiers, verbose=verbosity, n_jobs=-1)
wrapper = MultiOutputRegressor(cv)
wrapper.fit(X_train,y_train.to_numpy().reshape(-1,2))
ValueError: y must have at least two dimensions for multi-output regression but has only one.
I also tried it with out the the MOR wrapper:
cv = VotingRegressor(estimators = best_classifiers, verbose=verbosity, n_jobs=-1)
cv.fit(X_train,y_train)
ValueError: y should be a 1d array, got an array of shape (637, 2) instead.
As background, the best_classifiers (simplified) prints as:
[('knn', Pipeline(steps=[('imputation', SimpleImputer()), ('scaler', RobustScaler()),
('knn',
MultiOutputRegressor(estimator=KNeighborsRegressor(n_neighbors=10,
weights='distance')))])), ('svm', Pipeline(steps=[('imputation', SimpleImputer()), ('scaler', RobustScaler()),
('svm', MultiOutputRegressor(estimator=SVR(C=100.0)))]))]