I want to divide the quantity value into multiple rows divided by number of months from start date & end date column. Each row should have start date and end date of the month. I also want remaining quantity based on previous value. Below is my sample input and output.
Divide a column value into multiple rows by number of months based on start date & end date columns
94 Views Asked by isrikanthd At
2
There are 2 best solutions below
0
On
based on what I understood from the input and output in question, here's an example
data_sdf. \
withColumn('seq',
func.expr('sequence(trunc(start, "month"), last_day(end), interval 1 month)')
). \
withColumn('seq_st_end',
func.transform('seq',
lambda x, i: func.struct(x.alias('start'),
func.last_day(x).alias('end'),
(func.col('qty')/func.size('seq')).alias('qty'),
(func.col('qty') - ((func.col('qty')/func.size('seq')) * (i+1))).alias('remaining_qty')
)
)
). \
selectExpr('inline(seq_st_end)'). \
show(truncate=False)
# +----------+----------+-----+-------------+
# |start |end |qty |remaining_qty|
# +----------+----------+-----+-------------+
# |2023-01-01|2023-01-31|400.0|1600.0 |
# |2023-02-01|2023-02-28|400.0|1200.0 |
# |2023-03-01|2023-03-31|400.0|800.0 |
# |2023-04-01|2023-04-30|400.0|400.0 |
# |2023-05-01|2023-05-31|400.0|0.0 |
# +----------+----------+-----+-------------+
we can use sequence to create an array of month-dates within the start & end. using the array, we can transform it to calculate the start & end dates for each month, and the quantity columns.

Split column value into multiple rows using SQL