I have a postgresql DB which has tables containing multi-tenant data. Example:
Table Customer
CustId | Cust Name |
---|---|
First | CustA |
Second | CustB |
Table CustomerShoppingDet
Item Name | CustId. | Buying Date |
---|---|---|
First | CustA | 02-13-2023 |
Second | CustA | 03-14-2023 |
First. | CustB | 04-14-2023 |
These are all on AWS RDS scattered across regions. Lets assume, CustA
probably exists only in us-east region. Now I have a use case wherein I need to move CustA
from us-east (say - source) to eu-central region RDS (say - destination). So, I need to migrate all CustA
related data.
The data per customer is quite large and spans across multiple tables (~18GB per customer). We are thinking of replicating a snapshot as Phase I and using CDC I will sync all delta updates from source to destination until the D-Day when the switch happens from source to destination. We will disable CustA on source for a brief time before enabling it at the destination.
This is primarily for 2 reasons:
- Minimise the overhead of exporting the data for CustA in multiple chunks and phases. Not all tables have a
modified_date
column - Minimise the overall
switch
time from source to destination
Any suggestion on a suitable CDC replication mechanism that I can use here. I have gone through Debezium and AWS DMS offering but I am not completely sure if they will allow filtering on only CustA
records.
I can even build my own export, transform and load pipeline if that can offer me greater flexibility.
You can use logical decoding in PostgreSQL v15 or better, where
CREATE PUBLICATION
supports aWHERE
condition. You need to define a publication for each tenant. Then the subscriber will only receive changes for that tenant.