I have a Scala Spark dataframe with the schema:
root
|-- passengerId: string (nullable = true)
|-- travelHist: array (nullable = true)
| |-- element: integer (containsNull = true)
I want to iterate through the array elements and find the max number of occurrences of 0 values between 1 and 2.
| passengerID | travelHist |
|---|---|
| 1 | 1, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0 |
| 2 | 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 2, 0, 0, 0, 0 |
| 3 | 0,0,0,2,1,0,2,1,0 |
The output for the above records should look like below:
| passengerID | maxStreak |
|---|---|
| 1 | 7 |
| 2 | 3 |
| 3 | 1 |
What would be the most efficient way to find such an interval assuming the number of elements in the array does not exceed 50 values?
Let us do some pattern matching