How to pass Azure ADLS Storage Account Name and Container Name to the spark.readStream

80 Views Asked by Learnings At 24 January 2024 at 15:23

I have two storage account (STORAGE_ACCOUNT_A & STORAGE_ACCOUNT_B) under the same Resource Group and I have set up spark Streaming Job (Auto Loader).

df = spark.readStream.format("cloudFiles")
.option("cloudFiles.useNotifications", "true")
.option("cloudFiles.tenantId", "XXXX")
.option("cloudFiles.subscriptionID", "XXXX")
.option("cloudFiles.resourceGroup", "XXXX")
.option("cloudFiles.clientId", "XXXX")
.option("cloudFiles.clientSecret", "XXXX")
.option("cloudFiles.format", "csv")
.option("header", "true")
.schema(schema)
.load(source_path)

But when I shift to other Storage Account in source_path it gives error as below.

java.lang.IllegalStateException: The container in the file event {"create":{"bucket":"CONTAINER@STORAGE_ACCOUNT_B","key":"workbench/data/Landing/table/file.csv","size":800,"eventTime":1706107083508,"sequencer":"0000000000000000000000000002def9000000000000279c","newerThan$default$2":false}} is different from expected by the source: CONTAINER@STORAGE_ACCOUNT_A.

Spark Autoloader is unintentionally reading new file creation events triggered at the Resource Group level, affecting data ingestion from the newly migrated storage account. How can we restrict Spark Autoloader to selectively consume file events/queue related only to the migrated storage account, ensuring accurate data processing

Original Q&A

There are 1 best solutions below

KKL On 29 January 2024 at 21:22

This should really be a comment, but I don't have enough reputation to comment yet.

What you are trying to achieve seems to be unsupported with the file notification mode.

You could try one of the two:

run separate streaming jobs for each of your containers
use Directory listing mode instead

How to pass Azure ADLS Storage Account Name and Container Name to the spark.readStream

There are 1 best solutions below

Related Questions in PYSPARK

Related Questions in AZURE-BLOB-STORAGE

Related Questions in AZURE-DATABRICKS

Related Questions in SPARK-STRUCTURED-STREAMING

Related Questions in DATABRICKS-AUTOLOADER

Trending Questions

Popular # Hahtags

Popular Questions