I have a table in hive with two columns id(int) and xml_column(string). xml_column is actually a xml but it is stored as string.
+------+--------------------+
| id | xml_column |
+------+--------------------+
| 6723 |<?xml version="1....|
| 6741 |<?xml version="1....|
| 6774 |<?xml version="1....|
+------+--------------------+
My question is : I would like to parse this xml and split into schema format using spark (scala). Can anyone help me out as how to handle this ? Tried data bricks spark xml library but this library handles with xml files.
Or is there any way to convert this string column to json and I have a json parser which can handle this.
I am using spark version 2.3
Prerequisites:
You can make use of below: