PySpark Lag function based on condition

506 Views Asked by At

I am new to PySpark and have been trying a few stuff.

I have a data frame as follows

+----------+-----------+
|   Column1|    Column2|
+----------+-----------+
|    VALUE1|      30000|
|    VALUE2|      25000|
|    VALUE3|      20000|
|    VALUE4|      19500|
|    VALUE5|      18100|
+----------+-----------+

I want to add a new column such that its value is as per the following formula

CurrentRow[Column3] = 
    IF (CurrentRow[Column2] > PreviousRow[Column3]) 
    THEN PreviousRow[Column3]
    ELSE CurrentRow[Column2] * 0.9

Example below

+----------+------------------+------------------+
|   Column1|           Column2|           Column3|
+----------+------------------+------------------+
|    VALUE1|             30000|             27000|
|    VALUE2|             25000|             22500|
|    VALUE3|             20000|             18000|
|    VALUE4|             19500|             18000|
|    VALUE5|             18100|             18000|
+----------+------------------+------------------+

I tried searching for the lag function on the same column that is being updated (withColumn) but could not succeed

0

There are 0 best solutions below