Working with Datawarehouse for Multi-tenancy

382 Views Asked by At

We have Multi-tenancy application where we have separate database for each client in MongoDB, PostgreSQL & separate Index in ElasticSearch.

Now we are planning for Datawarehouse to be created in Amazon-RedShift Serverless & ingest all data from above databases into it as a separate database for each client.

We are planning to use AWS-Glue for the same. To ingest data from multiple data sources and multiple databases within these data sources.

Would like to know the approach(es) we should go for the this use case.

We are thinking the ETL process like this

  • File with all data source and database in S3 bucket
  • Credentials in AWS Secret Manager
  • Loop through the source file
  • Create database in Redshift
  • Ingest raw data from multiple data sources for single client.
  • As it is RDBMS, we can maintain data normalization

Is this the right approach?

1

There are 1 best solutions below

1
On

I have couple of questions -

  1. Are these live databases?
  2. Are you already on AWS?

If moving from on premise, it might be better to leverage amazon direct connect (pricing here https://aws.amazon.com/directconnect/pricing/) and use Glue to connect directly to on-premise DBs and ingest data to Datawarehouse, instead of creating another staging in S3.

All the best.