I have a dict similar to this:
dict = {'color':['red', 'blue', 'green'], 'fruits':['apple', 'banana', 'grape'], 'animal':['cat', 'dog']}
and df with two columns; text column with multiple strings:
index | text
-------------------------------
a | house, chair, green
-------------------------------
b | yellow, banana, wall
--------------------------------
c | dog, brown, grass
--------------------------------
I would like to add extra column to df with key pair from dict if any string from text column is matching with dict.values, so for a - color / b - fruits / c - animal.
I was trying with isin
for lists but thought maybe with dict would be more efficient.? Any help appreciated
The easiest way would be to use
apply()
.However, keep in mind that
apply()
is poorly optimized- it's roughly equivalent to using a for loop to apply the function.If performance is a concern you might consider inverting your dictionary like
{"red":"color", "blue":"color" ...}
and writing a simpler function to apply likeYou could also consider using one of the optimized functions for series of strs in pandas like
extract()
assuming thatdf["text"]
is a series of strs, not lists of strs. There are no optimized pandas functions for series of lists, and it's generally a bad idea to keep lists in DataFrames if performance is a priority.