Write dataframe without column names as part of the file path

1k Views Asked by seou1 At 29 June 2025 at 20:53

I have to write a Spark dataframe in the path of the format: base_path/{year}/{month}/{day}/{hour}/ If I do something like below:

pc = ["year", "month", "day", "hour"]
df.write.partitionBy(*pc).parquet("base_path/", mode = 'append')

It creates the location as: base_path/year=2022/month=04/day=25/hour=10/. I do not want the column names like year, month, day and hour to be the part of path but something like: base_path/2022/04/25/10/. Any solution for this?

Original Q&A

There are 1 best solutions below

Guy On 01 May 2022 at 07:57

The column names are written as part of the path because they are not written in the object itself so you need the column name in the path in order to be able to read it back (following hive style convention).
For more information about this see here.

If you would still want to write the data with the above path you can use multiple write commands with the explicit path and filter according to the partition values.
The current logic for determining the partition path is located here and there doesn't seem to be a way to replace this in a pluggable way (you could technically load a different implementation in the JVM or write you own writer implementation but I would not recommend that)

Write dataframe without column names as part of the file path

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in FILEPATH

Related Questions in PARTITION-BY

Trending Questions

Popular # Hahtags

Popular Questions