iterate through column to match with dict values

58 Views Asked by At

I have a dict similar to this:

dict = {'color':['red', 'blue', 'green'], 'fruits':['apple', 'banana', 'grape'], 'animal':['cat', 'dog']}

and df with two columns; text column with multiple strings:

index   |   text
-------------------------------
a       | house, chair, green
-------------------------------
b       | yellow, banana, wall
--------------------------------
c       | dog, brown, grass
--------------------------------

I would like to add extra column to df with key pair from dict if any string from text column is matching with dict.values, so for a - color / b - fruits / c - animal.

I was trying with isin for lists but thought maybe with dict would be more efficient.? Any help appreciated

1

There are 1 best solutions below

0
On

The easiest way would be to use apply().

def get_type(input_strs):
    for key, val in type_dict:
        for input_str in input_strs:
            if input_str in val:
                return key

df["str_type"] = df["text"].apply(get_type)

However, keep in mind that apply() is poorly optimized- it's roughly equivalent to using a for loop to apply the function.

If performance is a concern you might consider inverting your dictionary like {"red":"color", "blue":"color" ...} and writing a simpler function to apply like

def get_type(input_strs):
    for input_str in input_strs:
        if input_str in type_dict:
            return type_dict[input_str]

You could also consider using one of the optimized functions for series of strs in pandas like extract() assuming that df["text"] is a series of strs, not lists of strs. There are no optimized pandas functions for series of lists, and it's generally a bad idea to keep lists in DataFrames if performance is a priority.