pandas get_dummies how to remember which value become which new category?

852 Views Asked by At

it seems quick an ease to one-hot-encoding multiple categorical variables at once using get_dummies method, but how to remember which one is which so that my test data have the same feature as my training data? for example:

My training dataset has a CATEGORICAL feature:

   X
   cat
   dog
   lion
   lion

after get_dummies, I got something like this:

   X_1   X_2   X_3
    1     0     0
    0     1     0
    0     0     1
    0     0     1

after training model, I am ready to test my awesome magic model and here is the test data:

   X
   cat
   cat
   lion

if I apply the pd.get_dummies methods, I will get something like this:

   X_1      X_2
   1       0
   1       0
   0       1

which will be inconsistent with my train data features and i simply can't apply my model to the test data.

any suggestions so that I can get some like the following ?

   X_1   X_2   X_3
    1     0     0
    1     0     0
    0     0     1

How can I get a fit and transform functionality? again, I have over 50 categorical features and I can't apply LabelEncoder and then One_Hot_Encoder to them one by one.

Any suggestion? thank you.

1

There are 1 best solutions below

2
On

I use get_dummies for all data, after that I split it into training and testing.