I'm working with a large pandas
DataFrame
and I'm trying to optimize its performance using the numexpr
and Dask
libraries. I've tried using the numexpr.evaluate()
function to perform element-wise operations on the DataFrame, but it's still taking a long time to run.
Here's an example of the code I'm using:
import numexpr
import pandas as pd
df = pd.read_csv('large_data.csv')
# Perform element-wise operation using numexpr
df['new_column'] = numexpr.evaluate('df.column1 + df.column2')
Is there a way to further optimize the performance of this code using Dask
or some other method?