Could anyone help with advice on how to refactor this code to make it faster? We are ingesting JSON and transforming in preparation for a later step but this particular section of the Notebook is taking a lot longer than expected (and we're new to python). Code is below, thanks!
import pandas as pd
df_main = pd.DataFrame(table_data)
df_parameters = pd.DataFrame(parameter_data)
# Convert df_parameters to dictionary for faster lookups
parameter_dict = dict(zip(df_parameters['abc'], df_parameters['def']))
def replace_parameters(row):
if isinstance(row['expression'], str) and row['xyz'] == row['xyz']:
for parameter_name, value in parameter_dict.items():
if row['expression']:
row['expression'] = row['expression'].replace(f'|{parameter_name}|', str(value))
if row['sp']:
row['sp'] = row['sp'].replace(f'|{parameter_name}|', str(value))
return row
# Apply the replacement function only when xyz is equal to xyz
df_main = df_main.apply(replace_parameters, axis=1)
Instead of using a
for loop, use pattern matching. I believe this kind of expression you are using; if not, adjust the pattern accordingly.Expressions I used:
|param1| + |param2||param2| - |param3||param1| * |param3||param2| / |param3|You can use the code below.
Data I used:
Output:
Modify your function like below:
Output:
You can see the parameters are replaced. Once again, I am saying, according to your expressions, you create the pattern. For the above kind of expressions, below is the pattern I created:
'\\|param1\\||\\|param2\\||\\|param3\\|'.