Retain the latest record in hive table

266 Views Asked by At

I have a table with around 100 million records, on a daily basis we get around 100K records with updates. Currently we are applying a ROW_NUMBER on the timestamp and picking up the latest record by doing UNION ALL.

With this approach we are facing serious performance issues.

Can you suggest any better approach from performance perspective?

INSERT OVERWRITE TABLE tgt_tbl
SELECT * FROM
(
SELECT row_number() over (partition by acct_num order by time_stamp)
FROM 
     (SELECT acct_num , time_stamp FROM tgt_tbl
       UNION ALL 
      SELECT acct_num , time_stamp FROM Incremental table
     )t1
 ) t2
WHERE rnum = 1
0

There are 0 best solutions below