In DB2, I have a "live" table with data that is copied to an identically structured backup table daily. Since some of the live data may change after the backup, the respective columns need to be updated later. I was trying to achieve this with the following SQL:
UPDATE
table_a hist
SET
someCol = (
SELECT
someCol
FROM
table_b doc
WHERE
hist.someCol = doc.someCol)
WHERE
EXISTS (
SELECT
*
FROM
table_b doc
WHERE
hist.ID = doc.ID
AND (HIST.someCol != Doc.someCol
OR (HIST.someCol IS NULL
AND doc.someCol IS NOT NULL)));
This results in the following error:
The result of a scalar fullselect, SELECT INTO statement, or VALUES INTO statement is more than one row.
Since I could not find any mistakes in the query despite extended search, I tried a merge instead:
MERGE INTO table_a hist
USING (SELECT DISTINCT someCol FROM table_b) doc
ON (hist.someCol != doc.someCol)
WHEN MATCHED THEN UPDATE SET hist.someCol = doc.someCol
This gives another error:
The statement was not processed because a row of target table "MTCS.MOMS_SPLIT_DOC_HIST" was identified more than once for update, delete or insert
So, which approach is better first of all? I think merge is faster, but apparently I haven't quite understood how it works. The update is also harder to read. And what is wrong with the merge? As I read it, it should compare both tables using someCol field and update the field in table_a if both fields are not equal in both tables.
I think even though your Merge/Update works, it is still gonna be pretty heavy, especially when your tables are getting larger and larger. So instead of doing the heavy query, I think the following steps could help.
So next time all you need to do is to compare this hashvalue.