We are using DMS to migrate the data from On-prem to database to S3(Landing). Then we are using glue for transformation, after that it will be loaded in another S3(Raw).
Incremental load will happen based on DMS flag(I, U, D).
When there is a update in composite primary keys, DMS flag will be U. but after join it will be inserted to target instead of update.
For Example,
| Id | phone number | name |
|---|---|---|
| 1 | 888888888888 | x |
| 2 | 999999999999999 | y |
In the above table Id, Phone number are primary keys. after migration target also will have same records.
When there is update happened in source phone number from 888888888888 to 777777777
DMS will capture a CDC data like below
| Op | Id | phone number | name |
|---|---|---|---|
| U | 1 | 777777777 | x |
Based in Op column we can understand that there is updated happened in the source. Normally we will use Primary Key column for join and make changes in other column. But in this case update happened in Primary key column.
When I try to run the script based on Primary it is getting inserted as new row, instead of Updating the existing row. After that I am getting 3 rows but in the source I am having only 2 rows.
| Id | phone number | name |
|---|---|---|
| 1 | 888888888888 | x |
| 2 | 999999999999999 | y |
| 1 | 777777777 | x |
Ideally I should have only 2 rows in the target but I am getting 3. How can we handle this scenario.
We are using DMS to migrate the data from On-prem to database to S3(Landing). Then we are using glue for transformation, after that it will be loaded in another S3(Raw).
Incremental load will happen based on DMS flag(I, U, D).
When there is a update in composite primary keys, DMS flag will be U. but after join it will be inserted to target instead of update.
For Example,
| Id | phone number | name |
|---|---|---|
| 1 | 888888888888 | x |
| 2 | 999999999999999 | y |
In the above table Id, Phone number is primary keys. This is the data in source after migration target also will have same records.
When there is update happened in source phone number from 888888888888 to 777777777
DMS will capture a CDC data like below
| Op | Id | phone number | name |
|---|---|---|---|
| U | 1 | 777777777 | x |
Based in Op column we can understand that there is updated happened in the source. Normally we will use Primary Key column for join and make changes in other column. But in this case update happened in Primary key column.
When I try to run the script based on Primary it is getting inserted as new row, instead of Updating the existing row. After that I am getting 3 rows but in the source I am having only 2 rows.
| Id | phone number | name |
|---|---|---|
| 1 | 888888888888 | x |
| 2 | 999999999999999 | y |
| 1 | 777777777 | x |
Ideally I should have only 2 rows in the target but I am getting 3. How can we handle this scenario.