I have dataframe which I want to save as multiple xml files. This is my code:
employees
.repartition(col("first_name"))
.write()
.option("maxRecordsPerFile", 5)
.mode(SaveMode.Overwrite)
.partitionBy("first_name")
.format("xml")
.save("C:/spark_output/");
Im expecting output to see output like this:
spark_output/
first_name=Alex
part-00000.xml
part-00001.xml
first_name=Mike
part-00000.xml
part-00001.xml
first_name=Nicole
part-00000.xml
part-00001.xml
But the output contains only one file with 10 rows.
I don't understand what is going on? How can I fix this?
Any advice would be highly appreciated. Thanks
.partitionByis not supported for the spark-xml (Databricks' open source XML data sink) and does not appear to be on the roadmap for the project in GitHubhttps://github.com/databricks/spark-xml/issues/327