I´m triying to read a SAP ABAB XML via Spark using Databricks 'Spark-XML' jar.
My problem is the output dataframe schema is sorted alphabetically by default, I want to mantain the XML schema order.
XML file:
<?xml version="1.0" encoding="utf-16"?><asx:abap xmlns:asx="http://www.sap.com/abapxml" version="1.0"><asx:values><TAB><item>...
Spark Dataframe:
df = spark.read.format('com.databricks.spark.xml')\
.option('rowTag', 'item')\
.option('encoding', 'UTF-16')\
.load("path/to/file/.xml")
Result:
df.printSchema()
root
|-- AEDAT: string (nullable = true)
|-- ASTNR: long (nullable = true)
|-- BWD: long (nullable = true)
...
Is there any option to not sort the result?
Thanks!
No, although you can always
df.select("thing_I_want_first", "thing_I_want_second")
, though this would require you to know the order they appear in the XML.(What if they don't appear in the same order in the XML though? it would be ambiguous anwyay. There is not much meaning to the ordering of cols in a DataFrame either.)