SQL: Find duplicate records, but only return latest duplicate record?

231 Views Asked by At

I'm trying to find duplicate transmissions in our database, in this case, to find where a transmitter transmitted the same data multiple times in a row (3 or more)

I've tried this query, which gets me close, but returns one record per period of duplicate readings per transmitter, when I only want the most recent duplicate reading period

select transmitter_id,
       count(*)        as number_of_duplicate_readings,
       total_reading,
       max(transmissions.created_at) as latest_duplicate_reading
from transmissions
group by transmitter_id, total_reading
having count(*) > 3
order by latest_duplicate_reading desc;
+--------------+----------------------------+-------------+------------------------+
|transmitter_id|number_of_duplicate_readings|total_reading|latest_duplicate_reading|
+--------------+----------------------------+-------------+------------------------+
|205           |50                          |0            |2020-08-17 13:44:03     |
|204           |27                          |0            |2020-08-17 13:44:01     |
|202           |33                          |0            |2020-08-17 13:43:56     |
|200           |27                          |0            |2020-08-17 13:43:52     |
|197           |27                          |0            |2020-08-17 13:43:45     |
|196           |42                          |0            |2020-08-17 13:43:43     |
|194           |43                          |0            |2020-08-17 13:43:39     |
|193           |11                          |0            |2020-08-17 13:43:36     |
|192           |27                          |0            |2020-08-17 13:43:35     |
|505           |37                          |830          |2020-08-05 23:59:28     |
+--------------+----------------------------+-------------+------------------------+

If I order by transmitter ID, the issue becomes aparent:

+--------------+----------------------------+-------------+------------------------+
|transmitter_id|number_of_duplicate_readings|total_reading|latest_duplicate_reading|
+--------------+----------------------------+-------------+------------------------+
|2             |25                          |300          |2019-09-25 23:09:04     |
|2             |4                           |310          |2019-09-29 23:18:58     |
|2             |5                           |320          |2019-10-04 23:25:27     |
|3             |7                           |15670        |2020-02-06 23:25:14     |
|3             |4                           |16990        |2020-02-24 23:08:26     |
|3             |24                          |19810        |2020-04-13 23:30:59     |
|3             |15                          |19830        |2020-04-28 23:50:35     |
|3             |12                          |20160        |2020-05-19 23:33:27     |
|3             |8                           |20170        |2020-05-27 15:54:51     |
|3             |6                           |20180        |2020-06-21 23:45:22     |
+--------------+----------------------------+-------------+------------------------+

Does anyone know of a way to accomplish what I want in MySQL / MariaDB, or will I need to filter this further in other code to accomplish what I want?

Thanks!

1

There are 1 best solutions below

4
Thorsten Kettner On

With MySQL 5.1 you have no analytic function (MAX OVER etc.) available, nor even a WITH clause. The only option I see there is copy & paste. Use about the same query twice, once to find all duplicates, once to check for later duplicates.

select 
  transmitter_id,
  count(*) as number_of_duplicate_readings,
  total_reading,
  max(transmissions.created_at) as latest_duplicate_reading
from transmissions t
group by transmitter_id, total_reading
having count(*) > 3
and not exists
(
  select null
  from transmissions t2
  group by t2.transmitter_id, t2.total_reading
  having count(*) > 3
  and t2.transmitter_id = t.transmitter_id
  and max(t2.transmissions.created_at) > t.latest_duplicate_reading
)
order by latest_duplicate_reading desc;