repartition not working with xml file in Spark

108 Views Asked by Nemanja At 14 July 2023 at 23:05

I have dataframe which I want to save as multiple xml files. This is my code:

 employees
                .repartition(col("first_name"))
                .write()
                .option("maxRecordsPerFile", 5)
                .mode(SaveMode.Overwrite)
                .partitionBy("first_name")
                .format("xml")
                .save("C:/spark_output/");

Im expecting output to see output like this:

spark_output/
  first_name=Alex
    part-00000.xml
    part-00001.xml
  first_name=Mike
    part-00000.xml
    part-00001.xml
  first_name=Nicole
    part-00000.xml
    part-00001.xml

But the output contains only one file with 10 rows.

I don't understand what is going on? How can I fix this?

Any advice would be highly appreciated. Thanks

Original Q&A

There are 1 best solutions below

Zach King On 15 July 2023 at 16:14

.partitionBy is not supported for the spark-xml (Databricks' open source XML data sink) and does not appear to be on the roadmap for the project in GitHub

https://github.com/databricks/spark-xml/issues/327

repartition not working with xml file in Spark

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in DATABRICKS

Related Questions in SPARK-JAVA

Trending Questions

Popular # Hahtags

Popular Questions