Multiple Categorical Input Variables in Tensorflow

3.3k Views Asked by At

I have a data-set in which each feature vector has 50 features, 45 of which are categorical. I am having trouble sending the categorical variables into tensorflow. I have found an example tutorial for tensorflow with categorical variables, but do not understand how to adapt this to work with a set which has both types of data, and multiple features. My first attempt is below, but this does not encode the majority of variables.

input_classes, input_gradients, outputs = databank.get_dataset()

print("Creating feature matrix")
inputs = np.array(input_classes, dtype=np.int32)
outputs = np.array(outputs, dtype=np.int32)
random.seed(42)
input_train, input_test, output_train, output_test = cross_validation.train_test_split(inputs, outputs, test_size=0.2, random_state=42)

print("Creating DNN")
# Prepare the neural net
def my_model(X, y):
    # DNN with 10,20,10 hidden layers and dropout chance of 0.5
    layers = skflow.ops.dnn(X, [10, 20, 10], keep_prob=0.5)
    return skflow.models.logistic_regression(layers, y)


classifier = skflow.TensorFlowEstimator(model_fn=my_model, n_classes=2)

print("Testing DNN")
# Test the neural net
classifier.fit(input_train, output_train)
score = metrics.accuracy_score(classifier.predict(input_test), output_test)
print("Accuracy: %f" % score)

I think the real problem, is I don't really understand how to handle the input 'tensor' X to the my_model function in the above code.

1

There are 1 best solutions below

0
On

Use a categorical processor to map your categories into integers before inputting, like so

cat_processor = skflow.preprocessing.CategoricalProcessor()
X_train = np.array(list(cat_processor.fit_transform(X_train)))
X_test = np.array(list(cat_processor.transform(X_test)))
n_classes = len(cat_processor.vocabularies_[0])