I have a XML like this:
<IdentUebersetzungen>
<IdentUebersetzung IdentUebersetzungName="ABT">
<Lables>
<Lable ServiceShortName="TABROW_OperaAndDisplUnit1SparePartNumbe" LableName="SGIDK2_HW"/>
<Lable ServiceShortName="TABROW_OperaAndDisplUnit1HardwNumbe" LableName="SGIDK2"/>
<Lable ServiceShortName="TABROW_OperaAndDisplUnit1AppliSoftwVersiNumbe" LableName="ZIF"/>
<Lable ServiceShortName="TABROW_OperaAndDisplUnit1HardwVersiNumbe" LableName="BRIF"/>
<Lable ServiceShortName="TABROW_OperaAndDisplUnit1SeriaNumbe" LableName="SERNR"/>
</Lables>
</IdentUebersetzung>
<IdentUebersetzung IdentUebersetzungName="Batt">
<Lables>
<Lable ServiceShortName="Batt_ECUHardwNumbe" LableName="SGIDK2_HW"/>
<Lable ServiceShortName="Batt_SparePartNumbe" LableName="SGIDK2"/>
<Lable ServiceShortName="Batt_ApplSwVerCount" LableName="ZIF"/>
<Lable ServiceShortName="Batt_ECUHardwVersiNumbe" LableName="BRIF"/>
<Lable ServiceShortName="Batt_SeriaNumbe" LableName="SERNR"/>
</Lables>
</IdentUebersetzung>
<IdentUebersetzungen>
I used Spark-XML version com.databricks:spark-xml_2.12:0.15.0
df = spark.read.format("com.databricks.spark.xml")
.option("rowTag", IdentUebersetzung )
.option("attributePrefix","")
.load("xxxxxx")
df.show()
I got the following output:
+---------------------+------------+
|IdentUebersetzungName|Lables |
+---------------------+------------+
|ABT |{null, null}|
|Batt |{null, null}| |
+---------------------+------------+
Can someone tell me,
why the cloumn "Lables" contains only null values?
I want the xml attribute values of "IdentUebersetzungName", "ServiceShortName" and "LableName" in the dataframe, can I do with Spark-XML?
I tried with com.databricks:spark-xml_2.12:0.15.0, it seems that it supports nested XML not so well.
When we are trying attributePrefix="" then parsing is not happening proeprly and It may be the bug. Otherwise you can try below code to achieve the same.