Ffill and interpolate koalas dataframe

212 Views Asked by At

Is it possible to interpolate and ffill different columns in a Koalas dataframe something like this?

%%spark -s sparkenv2

kdf = ks.DataFrame({
    'id':[1,2,3,4],
    'A': [None, 3, None, None],
    'B': [2, 4, None, 3],
    'C': [99, None, None, 1],
    'D': [0, 1, 5, 4]
    },
    columns=['id','A', 'B', 'C', 'D'])

kdf['A']=kdf['A'].ffill()
kdf['B']=kdf['B'].interpolate()
2

There are 2 best solutions below

0
On

For ffill, this is taken from John Paton's blog

from pyspark.sql import Window
from pyspark.sql.functions import last

spark_df = kdf.to_spark()

# define the window
window = Window.orderBy('id').rowsBetween(-sys.maxsize, 0)

# define the forward-filled column
filled_column = last(spark_df['A'], ignorenulls=True).over(window)

# do the fill
spark_df_filled = spark_df.withColumn('A_filled', filled_column)

I have no answer for interpolate - still trying to find it myself.

PS - You can switch to backfill, by changing rowsBetween(0, max.size) and using first() rather than last().

0
On
kdf['A']=kdf['A'].ffill()

yes you can

kdf['B']=kdf['B'].interpolate()

no you cant The method pd.Series.interpolate() is not implemented yet.