Spark/Koalas implementation of pandas resample('D') method

487 Views Asked by svn At 28 June 2025 at 02:54

I have a Spark dataframe that needs to be ffilled. The size of the dataframe is large (>100 million rows). I'm able to achieve what I want using pandas as shown below.

new_df = df_pd.set_index('someDateColumn') \
              .groupby(['Column1', 'Column2', 'Column3']) \
              .resample('D') \
              .ffill() \
              .reset_index(['Column1', 'Column2', 'Column3'], drop=True) \
              .reset_index()

I got stuck when trying .resample('D') using Koalas. Is there any better alternative to do ffill replication logic in spark native functions? The reason being, I want to avoid pandas as it is not distributed and executes only on Driver Node.

How can I achieve the same as above using Spark/Koalas packages?

Original Q&A

There are 1 best solutions below

dsk On 04 August 2020 at 05:49

In case you are looking for forward fill in Spark, follow this tutorial in order to cater that - here

Spark/Koalas implementation of pandas resample('D') method

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in DATABRICKS

Related Questions in SPARK-KOALAS

Trending Questions

Popular # Hahtags

Popular Questions