I have a table with around 100 million records, on a daily basis we get around 100K records with updates.
Currently we are applying a ROW_NUMBER
on the timestamp and picking up the latest record by doing UNION ALL
.
With this approach we are facing serious performance issues.
Can you suggest any better approach from performance perspective?
INSERT OVERWRITE TABLE tgt_tbl
SELECT * FROM
(
SELECT row_number() over (partition by acct_num order by time_stamp)
FROM
(SELECT acct_num , time_stamp FROM tgt_tbl
UNION ALL
SELECT acct_num , time_stamp FROM Incremental table
)t1
) t2
WHERE rnum = 1