I have a single column dataframe like this
------------
date
------------
01/01/2020
02/01/2020
04/01/2020
05/01/2020
06/01/2020
I have to get the longest continuous period the start date and the end date . So in the above example I have a output like this
-----------------------------------------------
start | end | period_length |
-----------------------------------------------
04/01/2020 06/01/2020 3
My approach: Sort the data and find the lag with the previous row and whenever there is a lag > 1 , reset the period length But I am unable to figure out a way to reset the period on a particular condition. I am using spark 2.3
Note: My column name is "eventTime" like "2020-12-14 13:49:32"
Result