How to pivot in bigQuery using PIVOT?

2.4k Views Asked by At

I am trying to pull rows as columns in bigquery. This is how my data looks like now: enter image description here

This is how I want my data to look like: enter image description here

PS: While I have shown only 3 values in column SUB_CLASS_DESC actual count is in 100s. Hence, I am looking to use Procedural language as per documentation here. I followed the example shared here in towardsdatascience.com and wrote below code, but unfortunately that doesn't work:

DECLARE DEPT_CLASS_SUB_CLASS STRING;
SET DEPT_CLASS_SUB_CLASS = (SELECT CONCAT('("', STRING_AGG(DISTINCT DEPT_CLASS_SUB_CLASS, '", "'), '")')
FROM `analytics-mkt-cleanroom.Workspace.HS_AF_SG_R12_800K_SAMPLE_SALES_11_TEST`
);

EXECUTE IMMEDIATE FORMAT("""
CREATE OR REPLACE TABLE `analytics-mkt-cleanroom.Workspace.HS_AF_SG_R12_800K_SAMPLE_SALES_PIVOTED_12_TEST` AS 
SELECT * FROM 
(SELECT HH_ID,DEPT_CLASS_SUB_CLASS,SALE_AMT 
FROM `analytics-mkt-cleanroom.Workspace.HS_AF_SG_R12_800K_SAMPLE_SALES_11_TEST`
)
PIVOT
(SUM(SALE_AMT)
,FOR DEPT_CLASS_SUB_CLASS IN %s
)""",DEPT_CLASS_SUB_CLASS);

Error I am getting: enter image description here

Error message suggests to declare before the execute block, and I am doing exactly that, but I don't understand why the error still persists. I tried declaring variables DEPT_CLASS_SUB_CLASS in different ways but not successful yet. Could anyone please point out where I might be making the mistake.

Much appreciated!

2

There are 2 best solutions below

4
On BEST ANSWER

Consider below approach

execute immediate (select '''
  select *
  from your_table
  pivot (any_value(sale_amt) for replace(sub_class_desc, ' ', '_') in (''' || list || '''))
  '''
  from (
    select string_agg(distinct "'" || replace(sub_class_desc, ' ', '_') || "'") list
    from your_table
  )
)     

if applied to dummy data as in your question - output is

enter image description here

How can I save these results into a new pivoted table? Specifically where can I put my CREATE OR REPLACE TABLE?

execute immediate (select '''
  create or replace table `your_project.your_dataset.pivot_table` as 
  select *
  from your_table
  pivot (any_value(sale_amt) for replace(sub_class_desc, ' ', '_') in (''' || list || '''))
  '''
  from (
    select string_agg(distinct "'" || replace(sub_class_desc, ' ', '_') || "'") list
    from your_table
  )
);
0
On
  1. DEPT_CLASS_SUB_CLASS variable should be placed before any other statement, not just before an execute block being referenced. From your error message, you seems to declare a variable at [411:1] which means at 411 line. Kindly move it to the top of your script at line 1 and test it again.

  2. you have kind of a PIVOTing problem. I wrote down some test query which do PIVOTing and list columns in an alphabetical order at the same time.

DECLARE sample_data ARRAY<STRUCT<HH_ID STRING, SUB_CLASS_DESC STRING, SALE_AMT FLOAT64>> DEFAULT [
  ('HHH_001', 'K&B FIXTURE/PLUMBING', 139.),
  ('HHH_001', 'PULLDOWN KITCHEN FAUCETS', 129.),
  ('HHH_001', 'TUBULAR REPAIR & REPLACE', 0.)
];

CREATE TEMP TABLE data AS
  SELECT r.* REPLACE(TRANSLATE(SUB_CLASS_DESC, ' &/', '___') AS SUB_CLASS_DESC)
    FROM UNNEST(sample_data) r
;

EXECUTE IMMEDIATE FORMAT ("""
  SELECT * 
    FROM data  
   PIVOT (SUM(SALE_AMT) AS sale_amt FOR SUB_CLASS_DESC IN ('%s'));
""", (SELECT STRING_AGG(DISTINCT SUB_CLASS_DESC, "','"  ORDER BY SUB_CLASS_DESC ASC) FROM data)
);

Query Result