PySpark Lag function based on condition

515 Views Asked by SamaAdi At 28 June 2025 at 05:29

I am new to PySpark and have been trying a few stuff.

I have a data frame as follows

+----------+-----------+
|   Column1|    Column2|
+----------+-----------+
|    VALUE1|      30000|
|    VALUE2|      25000|
|    VALUE3|      20000|
|    VALUE4|      19500|
|    VALUE5|      18100|
+----------+-----------+

I want to add a new column such that its value is as per the following formula

CurrentRow[Column3] = 
    IF (CurrentRow[Column2] > PreviousRow[Column3]) 
    THEN PreviousRow[Column3]
    ELSE CurrentRow[Column2] * 0.9

Example below

+----------+------------------+------------------+
|   Column1|           Column2|           Column3|
+----------+------------------+------------------+
|    VALUE1|             30000|             27000|
|    VALUE2|             25000|             22500|
|    VALUE3|             20000|             18000|
|    VALUE4|             19500|             18000|
|    VALUE5|             18100|             18000|
+----------+------------------+------------------+

I tried searching for the lag function on the same column that is being updated (withColumn) but could not succeed

Original Q&A

PySpark Lag function based on condition

There are 0 best solutions below

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in PYSPARK-SCHEMA

Trending Questions

Popular # Hahtags

Popular Questions