By default, Snowflake comes with a 10 TB and 100 TB scale of TPC-DS datasets. Is it possible to get the same number of tables with fewer records/less volume(10 GB or even lesser than that)?
Thanks in advance!
By default, Snowflake comes with a 10 TB and 100 TB scale of TPC-DS datasets. Is it possible to get the same number of tables with fewer records/less volume(10 GB or even lesser than that)?
Thanks in advance!
You could clone the 10 TB database and then remove rows till you have your 10 GB of data.
Cloning docs: https://docs.snowflake.com/en/sql-reference/sql/create-clone.html
I did this with a CTAS for the year 2002 data. You should ORDER BY , otherwise the new table won't be well clustered. You will need to look at DATE_DIM to determine the date range, i.e. SELECT ... FROM JOIN DATE_DIM ON ... WHERE DATE_DIM.D_DATE ... ORDER BY D_DATE (or you can use the surrogate keys for the join, doesn't really matter). Or you could just create a VIEW with appropriate data ranges. Both are faster and cheaper than DELETE.
You can use the TPC-DS official kit (you can download it here) to generate the dataset at the scale factor you want (from 1 GB) and then upload it to your Snowflake account.
To generate the data to fill the tables you must use the dsdgen tool present in the kit.