I am implementing SCD 2 dimensions using MERGE statement in google cloud BigQuery.
In target table, is_active column indicates whether the record is the latest record. address_ref is the business key of each record. I am using subquery to indicate the action that need to be performed. Below is the my query:
MERGE `datahub-sit.dim_address` AS T
USING
(
SELECT
COALESCE(staging.address_ref, dim.address_ref) AS address_ref,
dim.building AS current_building,
staging.building AS new_building,
dim.road AS current_road,
staging.road AS new_road,
CASE
WHEN dim.address_ref IS NULL THEN 'NEW_INSERT'
WHEN staging.building <> dim.building THEN 'UPDATE_CURRENT'
WHEN staging.road <> dim.road THEN 'UPDATE_CURRENT'
END
AS MergeAction
FROM
`datahub-sit.staging_address` staging
FULL JOIN (
SELECT *
FROM `datahub-sit.dim_address`
WHERE is_active=TRUE
) AS dim
ON
staging.address_ref = dim.address_ref
UNION ALL
SELECT
dim.address_ref,
dim.building AS current_building,
staging.building AS new_building,
dim.road AS current_road,
staging.road AS new_road,
'NEW_INSERT' AS MergeAction
FROM
`datahub-sit.staging_address` staging
INNER JOIN
`datahub-sit.dim_address` dim
ON
staging.address_ref = dim.address_ref
AND staging.building <> dim.building
AND staging.road <> dim.road
) AS S
ON T.address_ref = S.address_ref AND S.MergeAction IN ('UPDATE_CURRENT')
WHEN MATCHED AND S.MergeAction = "UPDATE_CURRENT" THEN UPDATE SET is_active=FALSE
WHEN NOT MATCHED BY TARGET AND S.MergeAction = 'NEW_INSERT' THEN
INSERT(
address_ref,
building,
block,
road,
postcode,
address_deleted,
address_updated,
is_active
)
VALUES(
S.address_ref,
S.new_building,
block,
S.new_road
postcode,
S.current_road,
S.new_road,
TRUE
);
Then I got an error Unrecognized name: block at [39:5] in an INSERT() VALUES() statement. I tried to define "T.block" but still got an error Unrecognized name: T at [39:5].
Below is the fields in dim_address
And staging_address
Is there any way to change value of some columns but the remaining columns remain the same? The idea is that I will take some new value from "source S" and keep the remaining of other columns from "target T". But when I using statement WHEN NOT MATCHED BY TARGET, there is only values from S can be selected.
Thank you!


Yes, I think you can use a subquery in the
INSERT()statement to exclude the block column.Here is updated query: