I split my dataset in X_train
, Y_train
, X_test
and Y_test
, and then I used the symbolicRegressor...
I've already convert the string values from Dataframe in float values.
But by applying the symbolicRegressor
I get this error:
ValueError: could not convert string to float: 'd'
Where 'd' is a value from Y.
Since all my values in Y_train
and Y_test
are alphabetic character because they are the "labels", I can not understand why the symbolicRegressor
tries to get a float number ..
Any idea?
According to the
https://gplearn.readthedocs.io/en/stable/index.html
- "Symbolic regression is a machine learning technique that aims to identify an underlying mathematical expression that best describes a relationship". Pay attention tomathematical
. I am not good at the topic of the question andgplearn
's description does not clearly define area of applicability / restrictions.However, according to the source code
https://gplearn.readthedocs.io/en/stable/_modules/gplearn/genetic.html
methodfit()
ofBaseSymbolic
class contains lineX, y = check_X_y(X, y, y_numeric=True)
wherecheck_X_y()
issklearn.utils.validation.check_X_y()
. Argumenty_numeris
means: "Whether to ensure that y has a numeric type. If dtype of y is object, it is converted to float64. Should only be used for regression algorithms".So
y
values must be numeric.