For Snowflake prod->dev scheduled cloning, what is a good way to handle custom rules?

Question

For Snowflake prod->dev scheduled cloning, what is a good way to handle custom rules?

597 Views Asked by aphrid At 21 June 2022 at 18:50

We are trying to setup dev and qa environments using data from a prod environment.

We are not using CREATE DATABASE dev CLONE prod because we are trying to avoid cloning database-specific objects like stages and pipes, since we are using per-environment Terraform to manage pipe-related objects and want to avoid out-of-band changes to those objects.

On top of that, there are some tables that should not be cloned from prod->dev. I'm trying to design a cleaner solution than the cobbled mess that we have.

We have a scheduled script that does the following:

Connect to prod and dev databases and fetch the right src and dst schemas
Run SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = '<>' AND TABLE_TYPE = 'BASE TABLE' to get tables to clone
- Cloning tables across databases results in dangling references to constraints and sequences, so those have to be manually cloned https://docs.snowflake.com/en/user-guide/database-replication-considerations.html?#references-to-objects-in-another-database
For each table:
- If it shouldn't be cloned, skip it
- Run CREATE OR REPLACE TABLE <dev> CLONE <prod> COPY GRANTS;
- Run GET_DDL(<dev>) to see if the table has sequences/constraints to update
- Run CREATE OR REPLACE SEQUENCE <dev> CLONE <prod> to update the nextval of the sequence since our table was cloned and references the sequence from the source database (and it also has the wrong value anyways)
- Run ALTER TABLE <dev> ALTER COLUMN <> SET DEFAULT <new seq>.nextval
- Check if there are constraints
  - Run ALTER TABLE <dev> DROP CONSTRAINT <> since the cloned tables reference the source database
  - Run ALTER TABLE <dev> ADD CONSTRAINT <> to rebuild them to reference the destination database

So... it works, but it's hacky, fragile, and prone to updating because of custom rules. We currently have this running on an AWS lambda, but a first step would be to migrate this to pure Snowflake.

Does anyone have any suggestions to improve this process? Or at least have recommendations on Snowflake tools that

Original Q&A

There are 1 best solutions below

**NickW** · Answer 1 · 2022-06-22T11:03:41.567000

I realise this is not really an answer to your question but I would absolutely not do what you are proposing to do - it's not the way to manage your SDLC (in my opinion) and, especially if your data contains any PII information, copying data from a Prod to a non-Prod database runs the risk of all sorts of regulatory and audit issues.

I would do the following:

As a one-off exercise, create the scripts necessary to build the objects for your "standard" environment - presumably basing this off your current Prod environment
Manage these scripts in a version-controlled repository e.g. Git

You can then use these scripts to build any environment and you would change them by going through the standard Dev, Test, Prod SDLC.

As far as populating these environments with data goes, if you really need Production-like data (and production volumes of data) then you should build routines for copying the data from Prod to the chosen target environment that, where necessary, anonymise the data. These scripts should be managed in your code repository and as part of your SDLC there should be a requirement to build/update the script for any new/changed table

For Snowflake prod->dev scheduled cloning, what is a good way to handle custom rules?

There are 1 best solutions below

Related Questions in SNOWFLAKE-CLOUD-DATA-PLATFORM

Related Questions in DATABASE-ADMINISTRATION

Related Questions in DATABASE-CLONING

Trending Questions

Popular # Hahtags

Popular Questions