Spark/Koalas implementation of pandas resample('D') method

478 Views Asked by At

I have a Spark dataframe that needs to be ffilled. The size of the dataframe is large (>100 million rows). I'm able to achieve what I want using pandas as shown below.

new_df = df_pd.set_index('someDateColumn') \
              .groupby(['Column1', 'Column2', 'Column3']) \
              .resample('D') \
              .ffill() \
              .reset_index(['Column1', 'Column2', 'Column3'], drop=True) \
              .reset_index()

I got stuck when trying .resample('D') using Koalas. Is there any better alternative to do ffill replication logic in spark native functions? The reason being, I want to avoid pandas as it is not distributed and executes only on Driver Node.

How can I achieve the same as above using Spark/Koalas packages?

1

There are 1 best solutions below

0
On

In case you are looking for forward fill in Spark, follow this tutorial in order to cater that - here