Trying to Pass Pandas DataFrame to a Function and Return a Modified DataFrame

595 Views Asked by At

I'm trying to pass different pandas dataframes to a function that does some string modification (usually str.replace operation on columns based on mapping tables stored in CSV files) and return the modified dataframes. And I'm encountering errors especially with handling the dataframe as a parameter.

The mapping table in CSV is structured as follows:

From(Str) To(Str) Regex(True/False)
A A2
B B2
CD (.*) FG CD FG True

My code looks as something like this:

def apply_mapping_table (p_df, p_df_col_name, p_mt_name):

    df_mt = pd.read_csv(p_mt_name)

    for index in range(df_mt.shape[0]):
        # If regex is true
        if df_mt.iloc[index][2] is True:
         # perform regex replacing
            df_p[p_df_col_name] = df_p[p_df_col_name].replace(to_replace=df_mt.iloc[index][0], value = df_mt.iloc[index][1], regex=True)
        else:
            # perform normal string replacing
            p_df[p_df_col_name] = p_df[p_df_col_name].replace(df_mt.iloc[index][0], df_mt.iloc[index][1])

    return df_p

df_new1 = apply_mapping_table1(df_old1, 'Target_Column1', 'MappingTable1.csv')
df_new2 = apply_mapping_table2(df_old2, 'Target_Column2', 'MappingTable2.csv')

I'm getting 'IndexError: single positional indexer is out-of-bounds' for 'df_mt.iloc[index][2]' and haven't gone to the portion where the actual replacement is happening. Any suggestions to make it work or even a better way to do the dataframe string replacements based on mapping tables?

1

There are 1 best solutions below

0
ggeop On

You can use the .iterrows() function to iterate through lookup table rows. Generally, the .iterrows() function is slow, but in this case because the lookup table should be a small manageable table it will be completely fine.

You can adapt your give function as I did in the following snippet:

def apply_mapping_table (p_df, p_df_col_name, p_mt_name):
    df_mt = pd.read_csv(p_mt_name)
    for _, row in df_mt.iterrows():
        # If regex is true
        if row['Regex(True/False)']:
            # perform regex replacing
            df_p[p_df_col_name] = df_p[p_df_col_name].replace(to_replace=row['From(Str)'], value=row['To(Str)'], regex=True)
        else:
            # perform normal string replacing
            p_df[p_df_col_name] = p_df[p_df_col_name].replace(row['From(Str)'], row['To(Str)'])
    return df_p