AttributeError: DataFrameResampler object has no attribute interpolate in cuDF

47 Views Asked by At

With similiar syntax, here is original of pandas dataframe function to resample and interpolate:

def resample_and_interpolate(df, datetime_column):
  # Convert datetime column to datetime format
  df[datetime_column] = pd.to_datetime(df[datetime_column])
  df = df.set_index(datetime_column)

  # Resample to 1-minute intervals and interpolate
  df_resampled = df.resample('1T').interpolate(method='linear')

  # Resample to 1-hour intervals and forward fill and backward fill
  resampled_df = df_resampled.resample('1H').first().ffill().bfill()

  # Reset index to get datetime back as a column
  resampled_df = resampled_df.reset_index()

  return resampled_df

And here is the cuDF:

def g_resample_and_interpolate(gdf, datetime_column):
  # Convert datetime column to datetime format
  gdf[datetime_column] = cudf.to_datetime(df[datetime_column])
  gdf = gdf.set_index(datetime_column)

  # Resample to 1-minute intervals and interpolate
  gdf_resampled = gdf.resample('1T').interpolate(method='linear')

  # Resample to 1-hour intervals and forward fill and backward fill
  resampled_gdf = gdf_resampled.resample('1H').first().ffill().bfill()

  # Reset index to get datetime back as a column
  resampled_gdf = resampled_gdf.reset_index()

  return resampled_gdf

The original function of dataframe pandas is working fine, but the cuDF GPU dataframe is not working fine.

Usage:


resample_and_interpolate(gdf_dict[20000019].to_pandas, 'charttime')
g_resample_and_interpolate(gdf_dict[20000019], 'charttime')

Returning error:


KeyError: 'interpolate'
      6   # Resample to 1-minute intervals and interpolate
----> 7   df_resampled = df.resample('1T').interpolate(method='linear')

AttributeError: DataFrameResampler object has no attribute interpolate
1

There are 1 best solutions below

0
On

Add asfreq() to resolve that.

#@title Define Resample, Interpolate, Backward/Forward Fill Function
def resample_interpolate_bffill(gdf, date_colname = 'charttime'):
  return gdf.set_index(date_colname).resample('1T').asfreq().interpolate().resample('1H').bfill().ffill()