Snowflake show tables not accessed in last 20 days

1.7k Views Asked by At

There is a situation where I need to clean up my database in snowflake. we have around 40 database and each database has more than 100 table. Some are getting loaded everyday and some are not, but used everyday. However, There has been lots of table added for testing and other purpose (by lots of developer and user).

Now we are working on cleaning up un-used table.

We have query_history table which gives us the information of query run in past, however it has field such as database, warehouse, User etc. but not table.

I was wondering is there is any way we can write a query which give us table name not used (DDL and DML b0th) in last 10 days.

4

There are 4 best solutions below

2
On

The information schema has a tables view and in that you have a last altered column, will that work with you? It will not give you the last accessed table but will give the last altered table. Other than this, there are no easy way to get this information from snowflake at this time. I also needed this feature, I think we should request for this feature.

select table_schema,
       table_name,
       last_altered
from information_schema.tables
where table_type = 'BASE TABLE'
      and last_altered < dateadd( 'DAY', -10, current_timestamp() ) 
order by table_schema,
         table_name;
1
On
select obj.value:objectName::string objName
      , max(query_start_time) as QUERY_DATE_TIME
    from snowflake.account_usage.access_history 
    , table(flatten(direct_objects_accessed)) obj
    group by 1
    order by QUERY_DATE_TIME desc;
1
On
select table_catalog || '.' || table_schema || '.' || table_name as table_path, 
    table_name, table_schema as schema,
    table_catalog as database, bytes,
    to_number(bytes / power(1024,3),10,2) as gb, 
    last_altered as last_use,
    datediff('day',last_use,current_date) as days_since_last_use
from information_schema.tables
where days_since_last_use > 90 --use your days threshold
order by bytes desc;
0
On

You can get the list of tables from information_schema.tables and the access history from snowflake.account_usage.access_history. Looking for cases where a table exists in the former but not the latter (in the last 20 days) should give you what you need.


with table_list as (

    select

        table_catalog as table_database,
        table_schema,
        table_name
        
    from information_schema.tables

),

access_history AS (

    SELECT
   
        query_start_time,
        split(base.value:objectName, '.')[0]::string as table_database,
        split(base.value:objectName, '.')[1]::string as table_schema,
        split(base.value:objectName, '.')[2]::string as table_name
    
    from snowflake.account_usage.access_history
        , lateral flatten (base_objects_accessed) base
        , lateral flatten (base.value, path => 'columns') cols
    where query_start_time > current_date - interval '20 days'

),

tables_with_access as (

    select distinct

        table_database,
        table_schema,
        table_name

    from access_history

)

select *
from table_list
    left join tables_with_access twa
        using (table_database,table_schema, table_name)
where twa.table_name is null
;