Y_train values for symbolicRegressor

480 Views Asked by At

I split my dataset in X_train, Y_train, X_test and Y_test, and then I used the symbolicRegressor...

I've already convert the string values from Dataframe in float values. But by applying the symbolicRegressor I get this error:

ValueError: could not convert string to float: 'd'

Where 'd' is a value from Y.

Since all my values in Y_train and Y_test are alphabetic character because they are the "labels", I can not understand why the symbolicRegressor tries to get a float number ..

Any idea?

2

There are 2 best solutions below

1
On BEST ANSWER

According to the https://gplearn.readthedocs.io/en/stable/index.html - "Symbolic regression is a machine learning technique that aims to identify an underlying mathematical expression that best describes a relationship". Pay attention to mathematical. I am not good at the topic of the question and gplearn's description does not clearly define area of applicability / restrictions.

However, according to the source code https://gplearn.readthedocs.io/en/stable/_modules/gplearn/genetic.html method fit() of BaseSymbolic class contains line X, y = check_X_y(X, y, y_numeric=True) where check_X_y() is sklearn.utils.validation.check_X_y(). Argument y_numeris means: "Whether to ensure that y has a numeric type. If dtype of y is object, it is converted to float64. Should only be used for regression algorithms".

So y values must be numeric.

0
On

Sorry for the late replay. gplearn supports regression (numeric y) with the SymbolicRegressor estimator, and with the newly released gplearn 0.4.0 we also support binary classification (two labels in y) using the SymbolicClassifier. From the sounds of things though, you have a multi-label problem which gplearn does not currently support. It may be something we look to support in the future.