By default, Snowflake comes with a 10 TB and 100 TB scale of TPC-DS datasets. Is it possible to get the same number of tables with fewer records/less volume(10 GB or even lesser than that)?

Thanks in advance!

3

There are 3 best solutions below

0
On

You can use the TPC-DS official kit (you can download it here) to generate the dataset at the scale factor you want (from 1 GB) and then upload it to your Snowflake account.

To generate the data to fill the tables you must use the dsdgen tool present in the kit.

0
On

You could clone the 10 TB database and then remove rows till you have your 10 GB of data.

Cloning docs: https://docs.snowflake.com/en/sql-reference/sql/create-clone.html

0
On

I did this with a CTAS for the year 2002 data. You should ORDER BY , otherwise the new table won't be well clustered. You will need to look at DATE_DIM to determine the date range, i.e. SELECT ... FROM JOIN DATE_DIM ON ... WHERE DATE_DIM.D_DATE ... ORDER BY D_DATE (or you can use the surrogate keys for the join, doesn't really matter). Or you could just create a VIEW with appropriate data ranges. Both are faster and cheaper than DELETE.