My df schema is:
|-- students: array (nullable = true)
| |-- element: string (containsNull = true)
and the actual data is as follows:
+--------------------------------------------------+
| students|
+--------------------------------------------------+
| [(Alice:20), (Bob:13)]|
|[(James:39), (Robert:29), (Kevin:31), (Andrew:48)]|
| [(Richard:88)]|
+--------------------------------------------------+
I want to convert this column to json format.
+-----------------------------------------------------------------------------------------------------------+
| json_student|
+-----------------------------------------------------------------------------------------------------------+
|[{"name":"Alice","age":20},{"name":"Bob","age":13}] |
|[{"name":"James","age":39},{"name":"Robert","age":29},{"name":"Kevin","age":31},{"name":"Andrew","age":48}]|
|[{"name":"Richard","age":88}] |
+-----------------------------------------------------------------------------------------------------------+
What drives me hard is that the length of the array is different for each row.
Is there a simple solution to create the format I want?
I'm using spark 2.4.4 version.
Write a udf to solve this as follows.
Output :
Result