MySQL select max record from each group and insert into another table

523 Views Asked by At

There are 4 columns in table A, id, name, create_time and content.

create table A
(
    id int primary key,
    name varchar(20),
    create_time datetime,
    content varchar(4000)
);
create table B like A;

I want to select max create_time records in the same name, and insert into another table B.

Execute sql as follow, but the time consumption is unacceptable.

insert into B
select A.*
from A,
    (select name, max(create_time) create_time from B group by name) tmp
where A.name = tmp.name
  and A.create_time = tmp.create_time;

A table has 1000W rows and 10GB, execute sql spend 200s.

Is there any way to do this job faster, or change which parameters in MySQL Server to run faster.

p: table A can be any type, paration table or some else.

2

There are 2 best solutions below

0
On

First be sure you have proper index on A (name, create_time) and B (name, create_time) then try using explicit join and on condtion

insert into B 
select A.* 
from A 
inner join ( 
    select name, max(create_time) create_time 
    from B 
    group by name) tmp on  ( A.name = tmp.name and A.create_time = tmp.create_time)
2
On

The query you need is:

INSERT INTO B
SELECT m.*
FROM A m                                      # m from "max"
LEFT JOIN A l                                 # l from "later"
    ON m.name = l.name                        # the same name
        AND m.create_time < l.create_time     # "l" was created later than "m"
WHERE l.name IS NULL                          # there is no "later"

How it works:

It joins A aliased as m (from "max") against itself aliased as l (from "later" than "max"). The LEFT JOIN ensures that, in the absence of a WHERE clause, all the rows from m are present in the result set. Each row from m is combined with all rows from l that have the same name (m.name = l.name) and are created after the row from m (m.create_time < l.create_time). The WHERE condition keeps into the results set only the rows from m that do not have any match in l (there is no record with the same name and greater creation time).

Discussion

If there are more than one rows in A that have the same name and creation_time, the query returns all of them. In order to keep only one of them and additional condition is required.

Add:

OR (m.create_time = l.create_time AND m.id < l.id)

to the ON clause (right before WHERE). Adjust/replace the m.id < l.id part of the condition to suit your needs (this version favors the rows inserted earlier in the table).

Make sure the table A has indexes on the columns used by the query (name and create_time). Otherwise the performance improvement compared with your original query is not significant.