I have the following dataframe
root
|-- AUTHOR_ID: integer (nullable = false)
|-- Books: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- NAME: string (nullable = true)
| | |-- DATE: TimestampType (nullable = true)
How to find the oldest published book for each author ? I want to retrieve the date
{
"AUTHOR_ID": 1,
"FIRST_PUBLICATION": <Date>
"Books": "[ ... ]"
}
Many ways of doing, Lets Try window functions
Following you Edits