Cache memory causing duplicates restricting merge to happen in Fabric notebook

76 Views Asked by Sethulakshmi PS At 10 January 2024 at 15:16

I'm encountering an issue with a merge operation in a notebook, where I'm accessing tables from a lakehouse. The merge command fails with a duplicate error. However, when I query the table using SQL Server Management Studio (SSMS) connected to the lakehouse, it shows zero duplicates. I suspected a caching problem and attempted to resolve it by disabling the cache using the following code:

spark.conf.set("spark.synapse.vegas.useCache", "false")

df.cache()

df.unpersist()

I also manually switched environments within the notebook and found no duplicates. The perplexing aspect is that the issue persists when the notebook is triggered via a pipeline, even though there are no duplicates when tested manually. What could be the potential reasons behind this discrepancy, and how can it be addressed?

spark.conf.set("spark.synapse.vegas.useCache", "false")

df.cache()

df.unpersist()

Sample merge code looks like below

%%sql
MERGE INTO lakehouse2.table1 AS Target
USING ViewCreatedOnTable2InAnotherLakeHouse AS Source
ON Target.Col1=Source.Col1 and Target.Col2=Source.Col2

When matched then update set

Target.col3=Source.Colum3,
Target.Col4=Source.Col4

when Not Matched then Insert

(col1,col2,col3,col4)
Values(Source.col1,Source.Col2,Source.Col3,Source.Col4)

the notebook is using sparksql and scala

Original Q&A

Cache memory causing duplicates restricting merge to happen in Fabric notebook

There are 0 best solutions below

Related Questions in CACHING

Related Questions in SPARK-NOTEBOOK

Related Questions in MICROSOFT-FABRIC

Trending Questions

Popular # Hahtags

Popular Questions