BigQuery Update a Repeated column in a Table using another table

155 Views Asked by At

I am trying to write an update or merge BigQuery SQL to update the data_profile.name of table1 (repeated column) using another reference table having dataprofile data of table2 (flattendata).

enter image description here

Using SQL:

UPDATE `dataset.table_1`
SET data_profiles = ARRAY(
  SELECT (
    adp.id,dp.name
    )
     FROM UNNEST(data_profiles) AS adp join `dataset.table_2` dp on adp.id =dp.id 
)
WHERE true

but we want to update only the data_profile.name which got updated in SOURCE TABLE to avoid updation on all columns.

Please suggest any merge or update statement for this scenario.

2

There are 2 best solutions below

0
pmo511 On

You're trying to update an array_agg struct with an array, try something like this:

UPDATE `dataset.table_1` t1
SET t1.data_profiles  = (
   SELECT ARRAY_AGG(STRUCT(t2.id, t2.name))
   FROM `dataset.table_2` t2
   WHERE t2.id = t1.id
   GROUP BY t2.id
)
WHERE true
0
Damião Martins On

Assuming the source table have only delta changes and target table have an unique row identifier (e.g. col1), you can do a MERGE like this:

MERGE `lab.target_table` T
USING (
  SELECT 
    t.col1,
    ARRAY_AGG(
        STRUCT(
          adp.id,
          COALESCE(dp.name, adp.name) as name
        )
    ) as data_profiles
  FROM `lab.target_table` t, UNNEST(t.data_profiles) AS adp 
    LEFT JOIN `lab.source_table` dp on adp.id = dp.id 
  WHERE 
    EXISTS(SELECT * FROM UNNEST(t.data_profiles) AS adp join `lab.source_table` dp on adp.id = dp.id)
  GROUP BY col1
) S
ON T.col1 = S.col1
WHEN MATCHED THEN
  UPDATE SET data_profiles = S.data_profiles, dt = CURRENT_TIMESTAMP()

That way only rows in target table that contains data_profiles in the source table will be updated.