Clustering table with many columns

55 Views Asked by Omer Biber At 19 December 2021 at 18:34

I am having ETL runtime issues with a specific table that has ~100K rows and 650 columns most of them are ints (others are strings or dates). The table is clustered by 2 string columns, and the logic of the etl is mainly built with "Lag", "Coalesce", "Case" and "Least" commands. Lags are partitioned by the same columns as the ones in the table's clustering. The runtime of this table's step in the ETL is suffering from a very long duration. I am not highly familiar with cost effectiveness logic of Snowflake, and debugging this issue without knowing where to start looking takes too long (since running the query takes about an hour!) Any suggestions on where to start / reading materials that can help me solving this issue faster?

Original Q&A

There are 1 best solutions below

peterb On 19 December 2021 at 21:44

Cluster keys in snowflake work by reducing the number of micropartitions needed to scan. Since your table is very small, there is really no point in using a cluster key, and it will likely not have any impact on performance. I recommend looking at the query profile when the ETL step is running to see where the time is spent.

https://docs.snowflake.com/en/user-guide/ui-query-profile.html

Also, have you tried scaling to a larger warehouse?

Clustering table with many columns

There are 1 best solutions below

Related Questions in RUNTIME

Related Questions in SNOWFLAKE-CLOUD-DATA-PLATFORM

Related Questions in CLUSTERING-KEY

Trending Questions

Popular # Hahtags

Popular Questions