I have a table with values and a date starting from which this value is valid:
param | validfrom | value |
---|---|---|
param1 | 01-01-2022 | 1 |
param2 | 03-01-2022 | 2 |
param1 | 05-01-2022 | 11 |
param1 | 07-01-2022 | 1 |
I need to get values of each parameter on each day before the specified date
For example, I have to get values for 06-01-2022:
param | validfrom | value |
---|---|---|
param1 | 01-01-2022 | 1 |
param1 | 02-01-2022 | 1 |
param1 | 03-01-2022 | 1 |
param1 | 04-01-2022 | 1 |
param1 | 05-01-2022 | 11 |
param1 | 06-01-2022 | 11 |
param2 | 03-01-2022 | 2 |
param2 | 04-01-2022 | 2 |
param2 | 05-01-2022 | 2 |
param2 | 06-01-2022 | 2 |
In other words, I need to get rows for missing dates and fill values by previous value.
I use window funсtion to get last value like this:
val windowPartitionByCompositeKey = Window.partitionBy(CompositeKey.map(col):_*)
spark.table("table")
.where($"$ValidFrom" <= repDt )
.distinct()
.withColumn("max_validfrom", max($"$ValidFrom").over(windowPartitionByCompositeKey))
.where($"$ValidFrom" === $"max_validfrom")
.drop("max_validfrom")
.show()
and I get next result:
param | validfrom | value |
---|---|---|
param2 | 03-01-2022 | 2 |
param1 | 05-01-2022 | 11 |
But I need to get value of each param on each day before the specified date.
How I can get this?