ADD end of month column Dynamically to spark Dataframe

60 Views Asked by At

I have pyspark Dataframe as follows,

enter image description here

I need to add EOM column to all the null values for each id dynamically based on last non null EOM value and it should be continuous.

My output dataframe looks like this,

enter image description here

I have tried this logic

df.where("EOM IS not NULL").groupBy(df['id']).agg(add_months(first(df['EOM']),1))

but the expected format is different

1

There are 1 best solutions below

0
On
from pyspark.sql.functions import expr

df = spark.createDataFrame(
    [("2015-06-23", 5), ("2016-07-20", 7)],
    ("data_date", "months_to_add")
).select(to_date("data_date").alias("data_date"), "months_to_add")

df.withColumn("new_data_date", expr("add_months(data_date, months_to_add)")).show()