Vertica Table Analysis

350 Views Asked by At

I would like to analyze table usage on Verica to check the following

  1. the tables that are hit most be queries
  2. tables that are getting more write queries
  3. tables that are getting more read queries.

So I am asking for help for SQL query or if anyone has any documents please point me in right direction. Thank you.

1

There are 1 best solutions below

0
marcothesane On

Here, I create a function QTYPE() that assigns a request of type 'QUERY' to either a SELECT, an INSERT, or a MODIFY (meaning DELETE,UPDATE,MERGE). The differentiation comes from the fact that, in Vertica, UPDATE/MERGE are actually DELETEs, then INSERTs.

I use two regular expressions of a certain complexity: first, finding [schema.]tablename after a JOIN or FROM keyword, then finding [schema.]tablename after either the UPDATE, the INSERT INTO, the MERGE INTO and the DELETE FROM keywords. Then, I join back to the tables system table to a) only select the tables really existing and b) add the schema name if it is missing.

The final report would be:

 qtype  |                        tbname                        | tx_count 
--------+------------------------------------------------------+----------
 INSERT | dbadmin.nrm_cpustats_rate                            |       74  
 INSERT | dbadmin.v_poll_item                                  |       39  
 INSERT | dbadmin.child                                        |       32  
 INSERT | dbadmin.tbid                                         |       32  
 INSERT | dbadmin.etl_group_membership                         |       12  
 INSERT | dbadmin.sensor_oco                                   |       11  
 INSERT | webanalytics.webtraffic_part                         |       10  
 INSERT | webanalytics.webtraffic_new_design_platform_datadate |        9   
 MODIFY | cp.foo                                               |        2   
 MODIFY | public.foo                                           |        2   
 MODIFY | taboola_tests.foo                                    |        2   
 SELECT | dbadmin.flext                                        |      112 
 SELECT | dbadmin.children                                     |      112 
 SELECT | dbadmin.ffoo                                         |      112 
 SELECT | dbadmin.demovals                                     |      112 
 SELECT | dbadmin.allbut4                                      |      112 
 SELECT | dbadmin.allcols                                      |      112 
 SELECT | dbadmin.allbut1                                      |      112 
 SELECT | dbadmin.flx                                          |      112 

Here's the function definition, and the CREATE TABLE statement to collect the statistics of what you're looking for, and finally the query getting the 'hit parade' of the most touched tables ...

Mind you, it might become a long runner with a lot of history in your query_requests table ...

CREATE OR REPLACE FUNCTION qtype(sql VARCHAR(64000))
RETURN VARCHAR(8) AS BEGIN
  RETURN
    CASE UPPER(REGEXP_SUBSTR(sql,'\w+')::VARCHAR(16))
      WHEN 'SELECT' THEN 'SELECT'
      WHEN 'WITH'   THEN 'SELECT'
      WHEN 'AT'     THEN 'SELECT'
      WHEN 'INSERT' THEN 'INSERT'
      WHEN 'DELETE' THEN 'MODIFY'
      WHEN 'UPDATE' THEN 'MODIFY'
      WHEN 'MERGE'  THEN 'MODIFY'
      ELSE UPPER(REGEXP_SUBSTR(sql,'\w+')::VARCHAR(16))
    END
  ;
END;

DROP TABLE IF EXISTS table_op_stats;
CREATE TABLE table_op_stats AS 
WITH
-- need 1000 integers - up to ~400 source tables found in 1 select
i(i) AS (
  SELECT MICROSECOND(tm)
  FROM (
              SELECT TIMESTAMPADD(MICROSECOND,   1,'2000-01-01'::TIMESTAMP)
    UNION ALL SELECT TIMESTAMPADD(MICROSECOND,1000,'2000-01-01'::TIMESTAMP)
  ) l(ts)
  TIMESERIES tm AS '1 MICROSECOND' OVER(ORDER BY ts)
)
,
tblist AS (
-- selects can affect several types, found by JOIN or FROM keyword before
-- hence look_behind regular expression
SELECT 
    QTYPE(request) AS qtype
  , transaction_id
  , statement_id
  , i
  , LTRIM(REGEXP_SUBSTR(request,'(?<=(from|join))\s+(\w+\.)?\w+\b',1,i,'i')) as tbname
  FROM query_requests CROSS JOIN i
  WHERE request_type='QUERY'
    AND success
    AND LTRIM(REGEXP_SUBSTR(request,'(?<=(from|join))\s+(\w+\.)?\w+\b',1,i,'i')) <> ''
  UNION ALL
  -- insert/delete/update/merge queries only affect one table each
  SELECT
    QTYPE(request) AS qtype
  , transaction_id
  , statement_id
  , 1 AS i
  , LTRIM(REGEXP_SUBSTR(request,'(insert\s+.*into\s+|update\s+.*|merge\s+.*into|delete\s+.*from)\s*((\w+\.)?\w+)\b',1,1,'i',2)) as tbname
  FROM query_requests
  WHERE request_type='QUERY'
    AND success
    AND QTYPE(request) <> 'SELECT'
)
,
-- join back to the "tables" system table - removes queries from correlation names, and adds schema name if needed
real_tables AS (
  SELECT
    qtype
  , transaction_id
  , statement_id
  , i
, CASE WHEN SPLIT_PART(tbname,'.',2)=''
    THEN table_schema||'.'||tbname
    ELSE tbname
  END AS tbname
  FROM tblist
  JOIN tables ON CASE WHEN SPLIT_PART(tbname,'.',2)=''
                   THEN tbname=table_name
                   ELSE SPLIT_PART(tbname,'.',1)=table_schema AND SPLIT_PART(tbname,'.',2)=table_name
                 END
)
SELECT
  qtype
, transaction_id
, statement_id
, i
, tbname
FROM real_tables;
-- Time: First fetch (0 rows): 42483.769 ms. All rows formatted: 42484.324 ms

-- the query at the end:
WITH grp AS (
  SELECT
    qtype
  , tbname
  , COUNT(*) AS tx_count
  FROM table_op_stats
  GROUP BY 1,2
)
SELECT
  *
FROM grp
LIMIT 8 OVER(
  PARTITION BY qtype
  ORDER BY tx_count DESC
);