Insert data and fill columns with rank

624 Views Asked by At

original data: orginal_table

MID STATE CALL_TIME         RECORD_RANK
a   1    2020-12-18 09:00:00        1
a   2    2020-12-19 09:00:00        2
b   1    2020-12-18 09:00:02        1
c   1    2020-12-18 09:00:03        1
c   1    2020-12-19 09:00:03        2
c   1    2020-12-20 09:00:03        3
d   1    2020-12-19 09:00:00        1

The data I wanted to insert: insert_table

   MID  STATE      CALL_TIME  
   a    2     2020-12-30 09:00:00      
   b    2     2020-12-19 09:00:02   
   c    1     2020-12-21 09:00:03 
   e    1     2020-12-30 09:00:00 
   f    1     2020-12-30 09:00:00 
   f    2     2020-12-31 09:00:00

Goal

  • The original data will be inserted from the second data.
  • For original and inserted data, the pair MID and CALL_TIME is unique.
  • There is no RECORD_RANK column in the inserted data but RECORD_RANK will be calculated based on MID and CALL_TIME columns when inserted. When duplicated MID with different CALL_TIME, the value of RECORD_RANK with MID will be added by 1. The initial value is 1.
  • The earliest row in in insert_table is always later than the latest row in orginal_table with the same MID.

The expected example result as below:

MID  STATE      CALL_TIME         RECORD_RANK
a    1    2020-12-18 09:00:00        1
a    2    2020-12-19 09:00:00        2
b    1    2020-12-18 09:00:02        1
c    1    2020-12-18 09:00:03        1
c    1    2020-12-19 09:00:03        2
c    1    2020-12-20 09:00:03        3
d    1    2020-12-19 09:00:00        1
a    2    2020-12-30 09:00:00        3
b    2    2020-12-19 09:00:02        2  
c    1    2020-12-21 09:00:03        4
e    1    2020-12-30 09:00:00        1
f    1    2020-12-30 09:00:00        1 
f    2    2020-12-31 09:00:00        2

Note

  • mysql version: 5.5.47-log
2

There are 2 best solutions below

0
On BEST ANSWER

I think it is possible to handle the logic in a single insert, even in MySQL 5.x.

The target rank is the number of rows that already exists in the target table for the same mid, plus the number of mid rows in the source table prior to the current row. You can compute that with correlated subqueries:

insert into orginal_table (mid, state, call_time, record_rank)
select mid, state, call_time,
    1 + (
        select count(*)
        from orginal_table o
        where o.mid = i.mid
    ) + (
        select count(*) 
        from insert_table i1 
        where i1.mid = i.mid and i1.call_time < i.call_time
    ) as record_rank
from insert_table i

This assumes that all new rows are more recent that existing rows, as mentioned in your question. But if you want otherwise, that's an easy fix to the first subquery:

(
    select count(*)
    from orginal_table o
    where o.mid = i.mid and o.call_time < i.call_time
)

Here is a demo based on the nice test case built by Akina.

Side note: in MySQL 8.0, we would use a window function instead of the second subquery, which would make the query much more efficient:

insert into orginal_table (mid, state, call_time, record_rank)
select mid, state, call_time,
    row_number() over(partition by mid order by call_time) 
    + (
        select count(*)
        from orginal_table o
        where o.mid = i.mid
    ) as record_rank
from insert_table i
8
On

If the ealiest row in insert_table is always later than the latest row in orginal_table with the same MID then you may use BEFORE INSERT trigger:

CREATE TRIGGER tr_bi_original
BEFORE INSERT
ON orginal_table
FOR EACH ROW
SET NEW.RECORD_RANK = (SELECT COALESCE(COUNT(*), 0) + 1
                       FROM orginal_table
                       WHERE NEW.MID = orginal_table.MID)

After trigger creation you may simply add new rows by

INSERT INTO orginal_table
SELECT *, NULL FROM insert_table;

New values for RECORD_RANK will be added by the trigger.

fiddle


If the rows from both tables will mix and RECORD_RANK for some rows which are already present in orginal_table must be altered then the operation cannot be performed using one query (because both insert for new rows and update for existing ones needed). In this case I recommend to insert the rows with any (NULL) value for RECORD_RANK column, then recalculate the column value for all rows in a table.