Can Databricks Auto loader infer partitions?

722 Views Asked by alxsbn At 19 October 2025 at 07:26

By default, when you're using Hive partitions directory structure,the auto loader option cloudFiles.partitionColumns add these columns automatically to your schema (using schema inference).

This is the code:

checkpoint_path = "s3://dev-bucket/_checkpoint/dev_table"
(
    spark
    .readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "json")
    .option("cloudFiles.schemaLocation", checkpoint_path)
    .load("s3://autoloader-source/json-data")
    .writeStream
    .option("checkpointLocation", checkpoint_path)
    .trigger(availableNow=True)
    .toTable("dev_catalog.dev_database.dev_table")
)

But can we have an option to also create partitionq to the target table like you can do with a simple CREATE TABLE ? (E.g. if you have such classical structure /year=xxxx/month=xxx/day=xx)

Original Q&A

There are 1 best solutions below

Robert Kossendey On 15 December 2022 at 16:57 BEST ANSWER

You can use the .partitionBy() function.

checkpoint_path = "s3://dev-bucket/_checkpoint/dev_table"
(
    spark
    .readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "json")
    .option("cloudFiles.schemaLocation", checkpoint_path)
    .load("s3://autoloader-source/json-data")
    .writeStream
    .option("checkpointLocation", checkpoint_path)
    .partitionBy("col1", "col2")
    .trigger(availableNow=True)
    .toTable("dev_catalog.dev_database.dev_table")
)

Can Databricks Auto loader infer partitions?

There are 1 best solutions below

Related Questions in DATABRICKS

Related Questions in DATABRICKS-AUTOLOADER

Trending Questions

Popular # Hahtags

Popular Questions