Neural Nets Mixed Real-valued and Categorical Input Features

827 Views Asked by At

My question has three parts: (1) Can a feedforward Neural Network handle input features that are mixed: Some are categorical (discrete-valued: e.g., Low, Med, High) and some are real-valued? The total number of the input feature variables is about 80 - 90, and I wish to solve a (supervised) classification problem (2) If the answer to part (1) is yes, I have read about using binary codes like (Low = 001, Med = 010, High = 100, etc.) for representing the discrete-valued input feature-variables in other contexts--will that work for the NN's as well? I am concerned about scaling / normalization of the whole input feature vector (which I suppose is recommended)--how to scale/normalize the whole, mixed feature vector or it is not required? (3) Someone suggested that I use Random Forest (RF). I am not that familiar with the RF's. What are the pros and cons of using RF versus NN's in the given context?

1

There are 1 best solutions below

0
On

As far as point 2 goes, if you transform each of your categorical inputs into a k-vector (with k = # of classes) you are just introducing k new inputs, which are scaled in the range [0, 1], so if your real-valued input features are themselves scaled in that range you're pretty much okay.

Note that if you are using a tanh activation function (whose outputs range from -1 to 1), you should transform your categorical input features accordingly, so (say k = 3):

0 should become <1, -1, -1>

1 should become <-1, 1, -1>

2 should become <-1, -1, 1>

Hope I'm clear about that.