KerasClassifier error with categorical data

266 Views Asked by At

I try to create neural network for categorical data in python (3.5).

I have a table with 47 independent variables (X), and table with 1 column of dependent variable (y). This variable is categorical and it is one of three possible options. Because of this, I label it with LabelEncoder() so that this variable is now 0 or 1 or 2. Then I put those numbers in three columns : with OneHotEncoder, and delete last column. Why: Because combination of two 1 and 0 brings 3 possible outcomes.

For neural network I use softmax on output layer and categorical_crossentropy for loss function (this should be used for categorical data)

When I run my code, I get error:

 classification.py in _check_targets(y_true=array([[ 1.,  0.,  0.],
   [ 0.,  1.,  0.],
...
   [ 0.,  0.,  1.],
   [ 0.,  1.,  0.]]), y_pred=array([2, 2, 2, 2, 2]))
 77     if y_type == set(["binary", "multiclass"]):
 78         y_type = set(["multiclass"])
 79 
 80     if len(y_type) > 1:
 81         raise ValueError("Can't handle mix of {0} and {1}"
---> 82                          "".format(type_true, type_pred))
    type_true = 'multilabel-indicator'
    type_pred = 'binary'
 83 
 84     # We can't have more than one value on y_type => The set is no more needed
 85     y_type = y_type.pop()
 86 

ValueError: Can't handle mix of multilabel-indicator and binary

I don't understand the error: type_true -> is probably type of true data (real data that I have), and I can see that they are binary.

P.S.

If I remove two columns in y instead of one (Then I have only one column left), and I use sigmoid function with binary_crossentropy loss function, I don't get any error. So the data preparation seems to be ok?

P.P.S

My code is like this:

# y is like [['first'], ['second'], ['third'],...]
labelencoder_y_1 = LabelEncoder()
y[:, 0] = labelencoder_y_1.fit_transform(y[:, 0])

onehotencoder_y = OneHotEncoder(categorical_features = [0])
y = onehotencoder_y.fit_transform(y).toarray()



# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, 
random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Tuning the ANN
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense

def build_classifier(optimizer, units, layers):
    classifier = Sequential()
    classifier.add(Dense(units = units, kernel_initializer = 'uniform', activation = 'relu', input_dim = 47))
    for i in range(layers):
        classifier.add(Dense(units = units, kernel_initializer = 'uniform', activation = 'relu'))
    classifier.add(Dense(units = 3, kernel_initializer = 'uniform', activation = 'softmax'))
    classifier.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics = ['accuracy'])
    return classifier

classifier = KerasClassifier(build_fn = build_classifier)


parameters = {'batch_size': [32],
          'epochs': [64],
          'optimizer': ['rmsprop'],
          'units': [16],
          'layers': [2]}

grid_serach = GridSearchCV(estimator = classifier,
                       param_grid = parameters,
                       scoring = 'accuracy',
                       cv = 10,
                       n_jobs = 3)
grid_serach = grid_serach.fit(X_train, y_train)
best_parameters = grid_serach.best_params_
best_accuracy = grid_serach.best_score_

EDIT: I edit my question to have all three columns because of comment from @djk47463

0

There are 0 best solutions below