I am having ETL runtime issues with a specific table that has ~100K rows and 650 columns most of them are ints (others are strings or dates). The table is clustered by 2 string columns, and the logic of the etl is mainly built with "Lag", "Coalesce", "Case" and "Least" commands. Lags are partitioned by the same columns as the ones in the table's clustering. The runtime of this table's step in the ETL is suffering from a very long duration. I am not highly familiar with cost effectiveness logic of Snowflake, and debugging this issue without knowing where to start looking takes too long (since running the query takes about an hour!) Any suggestions on where to start / reading materials that can help me solving this issue faster?
Clustering table with many columns
55 Views Asked by Omer Biber At
1
There are 1 best solutions below
Related Questions in RUNTIME
- Razor.RuntimeCompilation creates an error
- Runtime Error 5 in VBA: Invalid Procedure Call or Argument
- I get this message when I open (most) games on my PC
- How to download and add .class/.jar file dynamically in java runtime class path Spring Boot 3.x
- Subsetting a list of files within a folder to apply python function
- Unable to download CSV file from web URL with runtime using python
- Set button Height from a constant value defined in a class in WPF
- Set picklist Value as default value in a field on sales a engagement Runtime Object
- How to adjust differences of hardwares while executing code
- Published .NET 8 Application Includes Windows SDK for .NET 6
- Method definition and objects in Java
- How to save the JavaScript runtime state
- St_union function taking a long time to run (R)
- Pass python script directly to python -m timeit
- Showing only previous output
Related Questions in SNOWFLAKE-CLOUD-DATA-PLATFORM
- Are there poor practices in this use of python cryptography package to generate RSA keypair?
- snowflake cost management page limited warehouse access to role
- How to make FLATTEN function in Snowflake return PATH in Dot Notation instead of Brackets Notation
- How to overwrite a single partition in Snowflake when using Spark connector
- snowflake enforce unsorted json into variant column
- Spark connectors from Azure Databricks to Snowflake using AzureAD login
- Load data from csv in airflow docker container to snowflake DB
- Snowflake ODBC xdg-open Missing X server or $DISPLAY
- How can I reduce table scan time in snowflake
- API INTEGRATION for azure devops git on snowflake
- When will "create or alter" be available to all accounts?
- Event_date reference in CTE
- Problem decorating Python stored procedure handler with @functools.cache
- How to add a 1 to a phone number and remove the dashes?
- DBT - Merge - Only update condition
Related Questions in CLUSTERING-KEY
- Should most Snowflake tables have a cluster key defined?
- Create and assign groups based on overlapping/nonoverlapping values between two columns
- Snowflake delete query scanning all partitions
- How can I do clustering with one variable input
- Snowflake Automatic Clustering RESUMED accid. nows always turns on again even after SUSPEND
- n_jobs got an unexpected keyword argument
- How does the CBO uses clustering key for an ORDER BY clause in snowflake?
- does clustering granularity fields in merge statement will improve the statement performance in bigquery?
- Clustering table with many columns
- snowflake show tables with cluster_by
- How to get the best K for self organizing maps "SOM" using Elbow method?
- How do i enable gce_persistent_disk_csi_driver_config with terraform gke module
- Slow Query Performance on Large Table
- Cassandra Partition Key and Clustering Column Size
- Clustering does not look right
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Cluster keys in snowflake work by reducing the number of micropartitions needed to scan. Since your table is very small, there is really no point in using a cluster key, and it will likely not have any impact on performance. I recommend looking at the query profile when the ETL step is running to see where the time is spent.
https://docs.snowflake.com/en/user-guide/ui-query-profile.html
Also, have you tried scaling to a larger warehouse?