How to fix "ValueError: Classification with data of type continuous is not supported."

525 Views Asked by At

I am trying to use auto sklearn for some pandas data, and when i run:

model.fit(X_train, y_train)

this error pops up:

ValueError                                Traceback (most recent call last)
<ipython-input-10-ed5cd6b32087> in <module>
      2 #         sklearn.model_selection.train_test_split(X, y, random_state=1)
      3 
----> 4 model.fit(X_train, y_train)
~/notebook/jupyterenv/lib/python3.6/site-packages/autosklearn/estimators.py in fit(self, X, y, X_test, y_test, feat_type, dataset_name)
    660                              "".format(
    661                                     target_type,
--> 662                                     supported_types
    663                                 )
    664                              )
ValueError: Classification with data of type continuous is not supported. Supported types are ['binary', 'multiclass', 'multilabel-indicator']. You can find more information about scikit-learn data types in: https://scikit-learn.org/stable/modules/multiclass.html

my (X,y) data looks something like this: (the headers HOMO/LUMO etc. are descriptors)

HOMO (A)  HOMO (AH)  LUMO (A)  LUMO (AH)  charge (AH)  Charge metal (A)  \
0     -7.8453    -9.6920   -4.2406    -6.9161            1            -0.938   
1     -7.7330    -9.6774   -4.0690    -6.9602            1            -0.911   
2     -7.6751    -9.6051   -3.9238    -6.8990            1            -0.950   
3     -8.1345    -9.8027   -6.3221    -7.5155            1            -0.868   
4     -7.9405    -9.4709   -5.7324    -6.9515            1            -0.880   
..        ...        ...       ...        ...          ...               ...   
164   -7.5867    -9.7576   -5.1992    -6.8152            1            -0.312   
165   -8.3700   -10.1670   -6.6819    -7.8044            1            -0.311   
166   -8.3445   -10.0288   -6.6499    -7.5991            1            -0.321   
167   -7.9764   -10.0586   -6.3554    -7.5688            1            -0.277   
168   -7.9317    -9.9008   -6.3104    -7.3790            1            -0.288   

    
[169 rows x 17 columns] [24.4 23.8 24.  14.2 22.5 18.5 19.4 17.4 22.6 16.3 20.3 13.2 16.5 21.2
 24.6 17.3 23.3 22.2 18.  31.1 29.7 30.4 22.  23.2 22.1 27.6 22.9 19.8
 18.3 18.5 44.8 39.4 46.  49.9 35.  22.5 32.  22.8 38.1 23.6 23.3 18.4
 15.6 11.3 13.3 13.9 16.1 20.8 23.  20.4  8.3 11.3 11.4 15.1 15.4 17.1
 18.7 21.1 26.6 23.  20.4 21.6 26.8  9.  11.4 32.7 -1.6 -0.3 -1.3 -0.4
 -3.9  1.   5.6  0.5  0.   4.5  6.8  7.8  4.2  1.1  4.2  5.5  0.8 12.
 17.   5.8 17.  26.1 27.2 31.9  0.5  1.5  8.5  7.1 25.5 40.  -5.7 -6.
 12.5  4.4 -5.  -1.3 -5.  -5.  -5.  -5.  -0.6 -0.6  2.   3.6  3.2  0.1
  2.1  4.5 11.   2.7  3.5 -2.   1.2  9.3  2.6  7.1  6.1  3.2  5.1  7.5
  1.8  4.3  4.4  0.8  9.9  7.6  7.9  8.9 10.  10.9 11.8  9.9 13.4 13.4
  8.8  2.1  6.   7.1 -1.1  0.5  0.3  4.7  6.   6.5  8.  11.6  6.9  8.4
  8.7  7.2  6.3  6.4  7.4 12.1 10.4 11.1 12.2 14.3 16.3  8.1  8.5  8.6
  9. ]
1

There are 1 best solutions below

0
Adept On

As the error explains, you're giving continuous variables to a model only handling binary or multiclass ones. What's the model ? You should check the doc to see how it works / what it handles or don't

-- FOLLOWING THE COMMENT

A continuous variable is a variable taking an infinite possibility (or really high) of numerous values (here is quite a good example since you have floats with 4 decimals, so it's obviously continuous). Hence, binary will be '1' or '0', and categorical would be a finite number of features (like 'January', 'February', ... , 'December', so only 12 possible categories). Many kind of models handle continuous variables (some ONLY want categorical variables), so if you don't have any constraint on your model, you can definitely switch to one of this kind.