I am trying to encode one-hot for my data frame. It is a multi dimension array and I am not sure how to do this. The data frame may look like this:
df = pd.DataFrame({'menu': [['Italian', 'Greek'], ['Japanese'], ['Italian','Greek', 'Japanese']], 'price': ['$$', '$$', '$'], 'location': [['NY', 'CA','MI'], 'CA', ['NY', 'CA','MA']]})
The output I want is something like this:
df2 = pd.DataFrame({'menu': [[1,1,0], [0,0,1], [1,1,1]], 'price': [[1,0], [1,0], [0,1]], 'location': [[1,1,1,0], [0,1,0,0], [1,1,0,1]]})
I am not sure how this can be done using pd.get_dummies or scikit-learn. Can someone help me?
You can use: