I have two storage account (STORAGE_ACCOUNT_A & STORAGE_ACCOUNT_B) under the same Resource Group and I have set up spark Streaming Job (Auto Loader).
df = spark.readStream.format("cloudFiles")
.option("cloudFiles.useNotifications", "true")
.option("cloudFiles.tenantId", "XXXX")
.option("cloudFiles.subscriptionID", "XXXX")
.option("cloudFiles.resourceGroup", "XXXX")
.option("cloudFiles.clientId", "XXXX")
.option("cloudFiles.clientSecret", "XXXX")
.option("cloudFiles.format", "csv")
.option("header", "true")
.schema(schema)
.load(source_path)
But when I shift to other Storage Account in source_path it gives error as below.
java.lang.IllegalStateException: The container in the file event {"create":{"bucket":"CONTAINER@STORAGE_ACCOUNT_B","key":"workbench/data/Landing/table/file.csv","size":800,"eventTime":1706107083508,"sequencer":"0000000000000000000000000002def9000000000000279c","newerThan$default$2":false}} is different from expected by the source: CONTAINER@STORAGE_ACCOUNT_A.
Spark Autoloader is unintentionally reading new file creation events triggered at the Resource Group level, affecting data ingestion from the newly migrated storage account. How can we restrict Spark Autoloader to selectively consume file events/queue related only to the migrated storage account, ensuring accurate data processing
This should really be a comment, but I don't have enough reputation to comment yet.
What you are trying to achieve seems to be unsupported with the file notification mode.
You could try one of the two: