oracle sql - finding entries with dates (start/end column) overlap

626 Views Asked by At

So data is something like this:

ID | START_DATE       | END_DATE         |  UID  | CANCELED
-------------------------------------------------
44 | 2015-10-20 22:30 | 2015-10-20 23:10 | 'one' |
52 | 2015-10-20 23:00 | 2015-10-20 23:30 | 'one' |
66 | 2015-10-21 13:00 | 2015-10-20 13:30 | 'two' | 

There are more than 100k of these entries.

We can see that start_date of the second entry overlaps with the end_date of the first entry. When dates do overlap, entries with lower id should be marked as true in 'CANCELED' column.

I tried some queries but they take a really long time so I'm not sure even if they work. Also I want to cover all overlaping cases so this also seems to slow this down.

I am the one responsible for inserting/updating these entries using pl/sql

update table set column = 'value' where ID = '44';
   if sql%rowcount = 0 
       then insert values(...)
   end if

so I could maybe do this in this step. But all tables are updated/inserted using one big pl/sql created dynamically where all rows either get updated or new ones get inserted so once again this seems to get slow.

And of all the sql 'dialects' oracle one is the most cryptic I had chance to work with. Ideas?

EDIT: I forgot one important detail, there is also one more column (UID) which is to be matched, update above

4

There are 4 best solutions below

3
On BEST ANSWER

I would start with this query:

update table t
    set cancelled = true
    where exists (select 1
                  from table t2
                  where t.end_date > t2.start_date and
                        t.uid = t2.uid and
                        t.id < t2.id
                 )

An index on table(uid, start_date, id) might help.

As a note: this is probably much easier to do when you create the table, because you can use lag().

0
On

This will do the trick without dynamic query nor correlated subqueries, but it consume some memory for the with clauses:

   MERGE INTO Table1 
   USING 
   (
   with q0 as( 
   select rownum fid, id, start_date from(
   select id, start_date from table1 
   union all 
   select 999999 id, null start_date from dual
   order by id
   )
   ), q1 as (
   select rownum fid, id, end_date from(
   select -1 id, null end_date from dual
   union all 
   select id, end_date from table1
   order by id
   )
   )
   select q0.fid, q1.id, q0.start_date, q1.END_DATE, case when (q0.start_date < q1.END_DATE) then 1 else 0 end canceled
   from q0
   join q1
   on (q0.fid = q1.fid)
   ) ta ON (ta.id = Table1.id)
WHEN MATCHED THEN UPDATE 
    SET Table1.canceled = ta.canceled;

The inner with select statement with alias ta will produce this result:

"FID"|"ID"|"START_DATE"     |"END_DATE"       |"CANCELED"
---------------------------------------------------------
1    |-1  |20/10/15 22:30:00|                 |0
2    |44  |20/10/15 23:00:00|20/10/15 23:10:00|1
3    |52  |21/10/15 13:00:00|20/10/15 23:30:00|0
4    |66  |                 |20/10/15 13:30:00|0

Then its used in the merge vwithout any correlated queries. Tested and worked fine using SQLDeveloper.

0
On

You can use BULK COLLECT INTO and FORALL to reduce context switching within a procedure:

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE test ( ID, START_DATE, END_DATE, CANCELED ) AS 
          SELECT 44, TO_DATE( '2015-10-20 22:30', 'YYYY-MM-DD HH24:MI' ), TO_DATE( '2015-10-20 23:10', 'YYYY-MM-DD HH24:MI' ), 'N' FROM DUAL
UNION ALL SELECT 52, TO_DATE( '2015-10-20 23:00', 'YYYY-MM-DD HH24:MI' ), TO_DATE( '2015-10-20 23:30', 'YYYY-MM-DD HH24:MI' ), 'N' FROM DUAL
UNION ALL SELECT 66, TO_DATE( '2015-10-21 13:00', 'YYYY-MM-DD HH24:MI' ), TO_DATE( '2015-10-21 12:30', 'YYYY-MM-DD HH24:MI' ), 'N' FROM DUAL
/

CREATE PROCEDURE updateCancelled
AS
  TYPE ids_t IS TABLE OF test.id%TYPE INDEX BY PLS_INTEGER;
  t_ids   ids_t;
BEGIN
  SELECT ID
  BULK COLLECT INTO t_ids
  FROM   (
          SELECT ID,
                 END_DATE,
                 LEAD( START_DATE ) OVER ( ORDER BY START_DATE ) AS NEXT_START_DATE
          FROM   TEST )
  WHERE  END_DATE > NEXT_START_DATE;

  FORALL i IN 1 .. t_ids.COUNT
    UPDATE TEST
    SET    CANCELED = 'Y'
    WHERE  ID = t_ids(i);
END;
/

BEGIN
  updateCancelled();
END;
/

Query 1:

SELECT * FROM TEST

Results:

| ID |                START_DATE |                  END_DATE | CANCELED |
|----|---------------------------|---------------------------|----------|
| 44 | October, 20 2015 22:30:00 | October, 20 2015 23:10:00 |        Y |
| 52 | October, 20 2015 23:00:00 | October, 20 2015 23:30:00 |        N |
| 66 | October, 21 2015 13:00:00 | October, 21 2015 12:30:00 |        N |

Or as a single SQL statement:

UPDATE TEST
SET    CANCELED = 'R'
WHERE  ID IN ( SELECT ID
               FROM ( SELECT ID,
                             END_DATE,
                             LEAD( START_DATE )
                               OVER ( ORDER BY START_DATE )
                               AS NEXT_START_DATE
                      FROM TEST )
               WHERE END_DATE > NEXT_START_DATE )
3
On

I think the following update should work:

update tbl
   set cancelled = 'TRUE'
 where t_id in (select t_id
                  from tbl t
                 where exists (select 1
                          from tbl x
                         where x.t_id > t.t_id
                           and x.start_date <= t.end_date));

Fiddle: http://sqlfiddle.com/#!4/06447/1/0

If the table is extremely large, you might be better off creating a new table using a CTAS (create table as select) query, where you can use the nologging option, allowing you to avoid having to write to the undo log. When you execute an update like you are doing now, you are writing the changes to Oracle's undo log so that, prior to committing the transaction, you have the option to rollback. This adds overhead. As a result a CTAS query with nologging might run faster. Here is one way for that approach:

create table new_table nologging as
with sub as
 (select t_id,
         start_date,
         end_date,
         'TRUE' as cancelled
    from tbl t
   where exists (select 1
            from tbl x
           where x.t_id > t.t_id
             and x.start_date <= t.end_date))
select *
  from sub
union all
select   t.*
      from tbl t
 left join sub s
        on t.t_id = s.t_id
      where s.t_id is null;

Fiddle: http://sqlfiddle.com/#!4/c6a29/1