SCD2 Implementation in Redshift using AWS GLue Pyspark

1.9k Views Asked by Hary S At 17 August 2025 at 22:46

I have a requirement to move data from S3 to Redshift. Currently I am using Glue for the work. Current Requirement:

Compare the primary key of record in redshift table with the incoming file, if a match is found close the old record's end date (update it from high date to current date) and insert the new one.
If primary key match is not found then insert the new record. Implementation: I have implemented it in Glue using pyspark with the following steps: Created dataframes which will cover three scenarios:
If a match is found update the existing record's end date to current date.
Insert the new record to Redshift table where PPK match is found
Insert the new record to Redshift table where PPK match is not found

Finally, Union all these three data frames into one and write this to redshift table.

With this approach, both old record ( which has high date value) and the new record ( which was updated with current date value) will be present.

Is there a way to delete the old record with high date value using pyspark? Please advise.

Original Q&A

There are 1 best solutions below

dashsidd1 On 07 June 2021 at 04:57

We have successfully implemented the desired functionality where in we were using AWS RDS [PostGreSql] as database service and GLUE as a ETL service . My Suggestion would be instead of computing the delta in sparkdataframes it would be far more easier and elegant solution if you create stored procedures and call them in pyspark Glue Shell . [for example : S3 bucket - > Staging table -> Target Table]

In addition if your execution logic is getting executed in less than 10 mins I will suggest you to use python shell and use external libraries such as psycopyg2 / sqlalchemy for DB operations .

SCD2 Implementation in Redshift using AWS GLue Pyspark

There are 1 best solutions below

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in AMAZON-REDSHIFT

Related Questions in AWS-GLUE

Related Questions in SCD2

Trending Questions

Popular # Hahtags

Popular Questions