Naming the csv file in write_df

736 Views Asked by Anubhav Dikshit At 27 July 2025 at 16:18

I am writing a file in sparkR using write_df, I am unable to specify the file name to this:

Code:

write.df(user_log0, path = "Output/output.csv",
         source = "com.databricks.spark.csv", 
         mode = "overwrite",
         header = "true")

Problem:

I expect inside the 'Output' folder a file called 'output.csv' but what happens is a folder called 'output.csv' and inside it called 'part-00000-6859b39b-544b-4a72-807b-1b8b55ac3f09.csv'

What am I doing wrong?

P.S: R 3.3.2, Spark 2.1.0 on OSX

Original Q&A

There are 1 best solutions below

Assaf Mendelson On 04 January 2017 at 09:18 BEST ANSWER

Because of the distributed nature of spark, you can only define the directory into which the files would be saved and each executor writes its own file using spark's internal naming convention.

If you see only a single file, it means that you are working in a single partition, meaning only one executor is writing. This is not the normal spark behavior, however, if this fits your use case, you can collect the result to an R dataframe and write to csv from that.

In the more general case where the data is parallelized between multiple executors, you cannot set the specific name for the files.

Naming the csv file in write_df

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in SPARKR

Trending Questions

Popular # Hahtags

Popular Questions