I want to create a new conditional variable "idx_cut" using following condition:
if idx = 1, then idx_cut = thru_dt + 30 (i.e., add 30 days to "thru_dt" date variable)
else: if from_dt > the lag(idx_cut), then idx_cut = thru_dt + 30 else retain idx_cut
This is what I tried:
w = Window.partitionBy('id').orderBy('id', 'from_dt', 'thru_dt')
df = data.withColumn('idx_cut', when((col('idx') ==lit(1)), col('hse_clm_thru_dt')+lit(30)).otherwise(None)). \
withColumn('idx_cut', when((col('idx_cut').isNotNull()), col('idx_cut')). \
otherwise(when((col('from_dt') < last('idx_cut', True).over(w)), last('idx_cut', True).over(w1x)).otherwise((col('thru_dt')+lit(30)))))
and this is the output:
+------------+---------+-----+---------------+---------------+----------+
| id| lag_id| idx| from_dt| thru_dt| idx_cut|
+------------+---------+-----+---------------+---------------+----------+
| C| null| 1| 2023-01-02| 2023-01-08|2023-02-07|
| C| C| 2| 2023-01-09| 2023-01-09|2023-02-07|
| C| C| 3| 2023-02-01| 2023-02-04|2023-02-07|
| C| C| 4| 2023-02-06| 2023-02-10|2023-02-07|
| C| C| 5| 2023-02-13| 2023-02-14|*2023-03-16*|
| C| C| 6| *2023-02-15*| 2023-02-23|**2023-03-25**|
| C| C| 7| 2023-02-24| 2023-03-03|2023-04-02|
| C| C| 8| 2023-03-07| 2023-03-11|2023-04-10|
| C| C| 9| 2023-03-14| 2023-03-16|2023-04-15|
| C| C| 10| 2023-03-20| 2023-03-21|2023-04-20|
| C| C| 11| 2023-03-26| 2023-03-31|2023-04-30|
| C| C| 12| 2023-04-03| 2023-04-10|2023-05-10|
| C| C| 13| 2023-04-15| 2023-04-22|2023-05-22|
| C| C| 14| 2023-04-24| 2023-04-28|2023-05-28|
| C| C| 15| 2023-05-04| 2023-05-18|2023-06-17|
+------------+---------+---+---------------+---------------+----------+
Unfortunately, this is not the desired output. My code works well up to where idx = 5, at which point it is supposed to reset the value of idx_cut on the sixth row to the lag value "2023-03-16" (But I get "2023-03-25").
The correct output would be the following:
+--+---------+-----+---------------+---------------+----------+
|id| lag_id| idx| from_dt | thru_dt| idx_cut|
+------------+-----+---------------+---------------+----------+
|C | null| 1| 2023-01-02| 2023-01-08|2023-02-07|
|C | C| 2| 2023-01-09| 2023-01-09|2023-02-07|
|C | C| 3| 2023-02-01| 2023-02-04|2023-02-07|
|C | C| 4| 2023-02-06| 2023-02-10|2023-02-07|
|C | C| 5| 2023-02-13| 2023-02-14|*2023-03-16*|
|C | C| 6| *2023-02-15*| 2023-02-23|**2023-03-16**|
|C | C| 7| 2023-02-24| 2023-03-03|**2023-03-16**|
|C | C| 8| 2023-03-07| 2023-03-11|**2023-03-16**|
|C | C| 9| 2023-03-14| 2023-03-16|**2023-03-16**|
|C | C| 10| 2023-03-20| 2023-03-21|2023-04-20|
|C | C| 11| 2023-03-26| 2023-03-31|**2023-04-20**|
|C | C| 12| 2023-04-03| 2023-04-10|**2023-04-20**|
|C | C| 13| 2023-04-15| 2023-04-22|**2023-04-20**|
|C | C| 14| 2023-04-24| 2023-04-28|2023-05-28|
|C | C| 15| 2023-05-04| 2023-05-18|**2023-05-04**|
+--+---------+-----+---------------+---------------+----------+
Thank you in advance!