Handling Incremental Refresh in AWS Appflow

155 Views Asked by At

I'm working on building a data warehouse using the following architecture: Salesforce -> Appflow -> S3 -> Glue -> SQL RDS Server. However, I've encountered an issue when implementing incremental refresh in Appflow.

The problem is that when I select incremental refresh in Appflow, it creates an entirely new row in a new parquet file if an existing row is changed, instead of updating the existing row in an existing parquet file. This behavior is causing duplication in S3, and the number of rows in S3 parquet files does not match the number of rows in the Salesforce object (for example, there are 100 rows in the Salesforce object but 110 rows in multiple S3 parquet files).

I would like to know how to handle this situation and prevent the duplication issue while ensuring that S3 rows align with the Salesforce object rows. Any insights or suggestions on how to configure the incremental refresh correctly in Appflow for this specific data warehousing architecture would be greatly appreciated.

0

There are 0 best solutions below