Remove duplicate overlapping dates in DB2 Table

505 Views Asked by At

Similar to other questions, but different....

I have multiple records and have flagged the ones that may be deleted if they contain ANY overlap with other records (that are not marked for deletion). I'd like to be able to do this efficiently on a 5+ million row table.

Here is what I have... and it's not working (not finding dups that I have put in manually to test)

delete from MYTABLE t1 
where t1.DELETABLEFLAG = 'Y' and
exists
       (select 1
       from
              MYTABLE t2
       where
              t2.MATCHCOLUMN = t1.MATCHCOLUMN 
              and t2.DELETABLEFLAG <>'Y'
              and
              ((t1.START_DATE between t2.START_DATE and t2.END_DATE) or
              (t1.END_DATE between t2.START_DATE and t2.END_DATE) or
              (t2.START_DATE between t1.START_DATE and t1.END_DATE) or
              (t2.END_DATE between t1.START_DATE and t1.END_DATE) )
       )

PS, I've obviously renamed the columns, so please no comments on my naming :)

Is there a more efficient way to do this?

PPS. START_DATE and END_DATE are TIMESTAMPS

1

There are 1 best solutions below

3
On BEST ANSWER

Two time periods overlap if both the following conditions are true:

  • The first starts before the second ends.
  • The first ends after the second starts.

I would write the query as:

delete from MYTABLE t1 
where t1.DELETABLEFLAG = 'Y' and
exists (select 1
        from MYTABLE t2
        where t2.MATCHCOLUMN = t1.MATCHCOLUMN AND
              t2.DELETABLEFLAG <> 'Y' AND
              t1.START_DATE <= t2.END_DATE AND
              t1.END_DATE >= t2.START_DATE
      );

Perhaps this will help.