Gaps and Islands solution in Oracle - use of recursive

976 Views Asked by At

I have a problem that could be easily solved using curser in Oracle. However, I wonder if that could be done using select only. I have 1 data set that contains the following fields: Start, Description, MaximumRow, SequentialOrder.

The data set is ordered by Description, Start, SequentialOrder. This is the data for illustration purpose:

enter image description here

I would like to get the following results in a different data set (Start, End, Description) where Start is the minimum of the "Start" field in a set and End is the maximum of the "Start" field in the set. The set is defined by the following rule: Total number of rows in the new set does not exceed the maximumrow defined in the previous set and all rows in the new set are ordered by SequentialOrder.

Based on the rule above, I have the following sets: enter image description here

So the results I would like to see based on the illustration is

12-Jun-15, 14-Jun-15, A
01-Jul-15, 01-Jul-15, A
02-Jul-15, 04-Jul-15, A
01-Aug-15, 16-Aug-15, B

If that could be done, please advise. I know we could group by Description but I do not know if we could do further grouping based on MaximumRow and SequentialOrder: As mentioned above, the total row in the subset to be evaluated can not exceed the MaximumRow AND has to be in ordered by SequentialOrder.

I do not think that could be done without using cursor but I ask anyway just in case there is.

I have attached the script to generate the sample above:

  CREATE TABLE "TEST" 
   (    "Start" DATE, 
    "Description" VARCHAR2(20 BYTE), 
    "MaximunRow" NUMBER, 
    "SequentialOrder" NUMBER
   ) 
SET DEFINE OFF;
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('12-JUN-15','DD-MON-RR'),'A',3,3);
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('13-JUN-15','DD-MON-RR'),'A',3,4);
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('14-JUN-15','DD-MON-RR'),'A',3,5);
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('01-JUL-15','DD-MON-RR'),'A',3,4);
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('02-JUL-15','DD-MON-RR'),'A',3,3);
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('04-JUL-15','DD-MON-RR'),'A',3,4);
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('01-AUG-15','DD-MON-RR'),'B',2,5);
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('16-AUG-15','DD-MON-RR'),'B',2,7);
1

There are 1 best solutions below

3
On BEST ANSWER

This is a variation of a gaps-and-islands problem, with the added complication of the maximum number of rows in each island. This is a bit long-winded but you could start by identifying the groups caused by the sequence order:

select t.*,
  row_number() over (partition by "Description" order by "Start") as rn,
  case when lag("SequentialOrder")
    over (partition by "Description" order by "Start") < "SequentialOrder"
    then 1 else 0 end as newblock
from test t
order by "Start";

Start     Description MaximunRow SequentialOrder  RN   NEWBLOCK
--------- ----------- ---------- --------------- --- ----------
12-JUN-15 A                    3               3   1          0
13-JUN-15 A                    3               4   2          1
14-JUN-15 A                    3               5   3          1
01-JUL-15 A                    3               4   4          0
02-JUL-15 A                    3               3   5          0
04-JUL-15 A                    3               4   6          1
01-AUG-15 B                    2               5   1          0
16-AUG-15 B                    2               7   2          1

You can then use a recursive CTE (from 11gR2 onwards) based on that:

with u as (
  select t.*,
    row_number() over (partition by "Description" order by "Start") as rn,
    case when lag("SequentialOrder")
      over (partition by "Description" order by "Start") < "SequentialOrder"
      then 1 else 0 end as newblock
  from test t
),
r ("Start", "Description", "MaximunRow", "SequentialOrder", rn, blocknum,
  pos, lastmaxrow) as (
  select u."Start", u."Description", u."MaximunRow", u."SequentialOrder", u.rn,
    1, 1, u."MaximunRow"
  from u
  where rn = 1
  union all
  select u."Start", u."Description", u."MaximunRow", u."SequentialOrder", u.rn,
    case when r.pos = r.lastmaxrow or u.newblock = 0
      then r.blocknum + 1 else r.blocknum end,
    case when r.pos = r.lastmaxrow or u.newblock = 0
      then 1 else r.pos + 1 end,
    case when r.pos = r.lastmaxrow or u.newblock = 0
      then r.lastmaxrow else u."MaximunRow" end
  from r
  join u on u."Description" = r."Description" and u.rn = r.rn + 1
)
select * from r
order by "Start";

Start     Description MaximunRow SequentialOrder  RN   BLOCKNUM  POS LASTMAXROW
--------- ----------- ---------- --------------- --- ---------- ---- ----------
12-JUN-15 A                    3               3   1          1    1          3
13-JUN-15 A                    3               4   2          1    2          3
14-JUN-15 A                    3               5   3          1    3          3
01-JUL-15 A                    3               4   4          2    1          3
02-JUL-15 A                    3               3   5          3    1          3
04-JUL-15 A                    3               4   6          3    2          3
01-AUG-15 B                    2               5   1          1    1          2
16-AUG-15 B                    2               7   2          1    2          2

This is assigning a blocknum to each row, with that starting at one for each description in the anchor member, and being incremented in the recursive member either if the newblock is zero (indicating a sequence break) or the number of members in the block is the previous maximum. (I may not have the logic for 'previous maximum' quite right as it isn't clear in the question.)

You can then group by the description and the generated block number:

with u as (
  select t.*,
    row_number() over (partition by "Description" order by "Start") as rn,
    case when lag("SequentialOrder")
      over (partition by "Description" order by "Start") < "SequentialOrder"
      then 1 else 0 end as newblock
  from test t
),
r ("Start", "Description", "MaximunRow", "SequentialOrder", rn, blocknum,
  pos, lastmaxrow) as (
  select u."Start", u."Description", u."MaximunRow", u."SequentialOrder", u.rn,
    1, 1, u."MaximunRow"
  from u
  where rn = 1
  union all
  select u."Start", u."Description", u."MaximunRow", u."SequentialOrder", u.rn,
    case when r.pos = r.lastmaxrow or u.newblock = 0
      then r.blocknum + 1 else r.blocknum end,
    case when r.pos = r.lastmaxrow or u.newblock = 0
      then 1 else r.pos + 1 end,
    case when r.pos = r.lastmaxrow or u.newblock = 0
      then r.lastmaxrow else u."MaximunRow" end
  from r
  join u on u."Description" = r."Description" and u.rn = r.rn + 1
)
select min(r."Start") as "Start", max(r."Start") as "End", r."Description"
from r
group by r."Description", r.blocknum
order by r."Description", r.blocknum;

Start     End       Description
--------- --------- -----------
12-JUN-15 14-JUN-15 A          
01-JUL-15 01-JUL-15 A          
02-JUL-15 04-JUL-15 A          
01-AUG-15 16-AUG-15 B          

Your sample data doesn't trigger the maximum rows break as you don't have any sequences longer than 3 anyway. With some additional data:

Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('15-JUN-15','DD-MON-RR'),'A',3,7);
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('16-JUN-15','DD-MON-RR'),'A',3,8);
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('17-JUN-15','DD-MON-RR'),'A',3,10);
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('18-JUN-15','DD-MON-RR'),'A',3,12);
Insert into TEST ("Start","Description","MaximunRow","SequentialOrder") values (to_date('19-JUN-15','DD-MON-RR'),'A',3,13);

the same query gets:

Start     End       Description
--------- --------- -----------
12-JUN-15 14-JUN-15 A          
15-JUN-15 17-JUN-15 A          
18-JUN-15 19-JUN-15 A          
01-JUL-15 01-JUL-15 A          
02-JUL-15 04-JUL-15 A          
01-AUG-15 16-AUG-15 B          

so you can see it's splitting on sequence change and on hitting three rows in the block.

SQL Fiddle demo.

You could get away with just the recursive CTE, and not the previous intermediate one, by comparing the sequential order directly in the case statements instead of using newblock; but having rn to find the next row is easier than trying to find the next date as they aren't contiguous.