How do I create a Sequence in Pyspark that resets when rows change from 0 to 1 and and increments when all are 1's

164 Views Asked by Shay Pal At 10 May 2021 at 14:09

I have a pyspark dataframe like this and need the SEQ output as shown:

R_ID    ORDER   SC_ITEM seq
A   1       0
A   3   1   1
A   4   1   2
A   5   1   3
A   6   1   4
A   7   1   5
A   8   1   6
A   9   1   7
A   10  0   0
A   11  1   1
A   12  0   0
A   13  1   
A   14  0   
A   15  1   1
A   16  1   2
A   17  1   3
A   18  1   4
A   19  1   5
A   20  1   6
A   21  0   0
A   22  0   0
B   1   0   0
B   2   1   1
C   1   1   1
C   2   1   2

Not sure if the data is showing properly. So pic attached :enter image description here

I did something like this :

RN = Window().orderBy(lit('A'))


.when(((F.col("R_ID")==(lag(F.col("R_ID"),1).over(RN))) & (F.col("SC_ITEM")== 1)), (F.col("SC_ITEM") + (lag(F.col("SEQ"),1).over(RN))))\

Not sure if I can do lead or lag over the SEQ. Please help how to do this

Original Q&A

How do I create a Sequence in Pyspark that resets when rows change from 0 to 1 and and increments when all are 1's

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PYSPARK

Related Questions in SPARK-NOTEBOOK

Trending Questions

Popular # Hahtags

Popular Questions