I have a dataframe with several columns. One column is "category", which is a space separated string. A sample of the df's category is:
3 36 211 433 474 533 690 980
3 36 211
3 16 36 211 396 398 409
3 35 184 590 1038
67 179 208 1008 5000 5237
I have another list of categories dict = [3,5,7,8,16,5000]. What I would like to see is a new data frame with dict as columns, and 0/1 as entries. If a row in df contains the dict entry, it's 1, else it's 0. So the output is:
3 5 7 8 16 36 5000
1 0 0 0 0 1 0
1 0 0 0 0 1 0
1 0 0 0 1 1 0
1 0 0 0 0 0 0
0 0 0 0 0 0 1
Have tried something like:
for cat in level_0_cat:
df[cat] = df.apply(lambda x: int(cat in map(int, x.category)), axis = 1)
But it does not work for large dataset (10 million rows). Have also tried isin, but have not figured out. Any idea is appreciated.
This ought to do it.
Edit: answer way to do this without directly calling any pandas indexing methods like
ix
orloc
.