How to group data where no logical grouping is available in oracle

54 Views Asked by At

I have a data like below in table

id code sinfo tdate
1  x     y   1-Jan-07
1  x     y   6-apr-07
1  x     y   9-apr-07

and I want data like below

enter image description here

If there is any gap between the dates then it should check the next day whether the data available or not if available the end date should be the tdate of that row.

I tried to achieve this through analytic functions but no luck.

3

There are 3 best solutions below

0
Littlefoot On

Something like this, perhaps?

Sample data:

SQL> with
  2  test (id, code, sinfo, tdate) as
  3    (select 1, 'X', 'Y', date '2007-01-01' from dual union all
  4     select 1, 'X', 'Y', date '2007-04-06' from dual union all
  5     select 1, 'X', 'Y', date '2007-04-09' from dual
  6    ),

Query begins here:

  7  temp as
  8    (select id, code, sinfo,
  9       tdate,
 10       lead(tdate) over (partition by id order by tdate) next_tdate
 11     from test
 12    )
 13  select id, code, sinfo,
 14    tdate,
 15    case when to_char(tdate, 'yyyymm') = to_char(next_tdate, 'yyyymm') then next_tdate
 16         else tdate
 17    end tenddate
 18  from temp
 19  where next_tdate is not null;

        ID CODE SINFO TDATE      TENDDATE
---------- ---- ----- ---------- ----------
         1 X    Y     01.01.2007 01.01.2007
         1 X    Y     06.04.2007 09.04.2007

SQL>
0
MT0 On

From Oracle 12, you can use MATCH_RECOGNIZE for row-by-row pattern matching.

It is not clear what the logic is for grouping your rows but if it was that rows are grouped when each row is within 3 days of the previous row then:

SELECT *
FROM   table_name
       MATCH_RECOGNIZE(
         PARTITION BY id, code, sinfo
         ORDER BY tdate
         MEASURES
           FIRST(tdate) AS tdate,
           LAST(tdate) AS tenddate
         PATTERN (first_row within_3_days*)
         DEFINE
           within_3_days AS tdate <= PREV(tdate) + INTERVAL '3' DAY
       )

Which, for the sample data:

CREATE TABLE table_name (id, code, sinfo, tdate) AS
SELECT 1, 'X', 'Y', DATE '2007-01-01' FROM DUAL UNION ALL
SELECT 1, 'X', 'Y', DATE '2007-04-06' FROM DUAL UNION ALL
SELECT 1, 'X', 'Y', DATE '2007-04-09' FROM DUAL;

Outputs:

ID CODE SINFO TDATE TENDDATE
1 X Y 2007-01-01 00:00:00 2007-01-01 00:00:00
1 X Y 2007-04-06 00:00:00 2007-04-09 00:00:00

If you have different business logic then amend the PATTERN and/or DEFINE clauses to reflect that logic.

If you want to group by month then you can simply use GROUP BY:

SELECT id,
       code,
       sinfo,
       MIN(tdate) AS tdate,
       MAX(tdate) AS tenddate
FROM   table_name
GROUP BY id, code, sinfo, TRUNC(tdate, 'MM')

Which outputs the same as MATCH_RECOGNIZE (since the groups are identical whether they are in the same month or within 3 consecutive days).

fiddle

0
d r On

As I undertood this it is about last and the row before last (keeping it within group). So, the question is what makes the group? Is it Just ID or ID, CODE or MONTH whatever... If that is the case then one of the options is to use Case expression and LEAD() analytic function. In this answeer I put the group (partition by) as ID, CODE, SINFO - it could be month with or without any other column, though. You should adjust it to your actual context and logic.

WITH    --  S a m p l e    D a t a :
    tbl (ID, CODE, SINFO, TDATE) AS
        ( Select 1, 'X', 'Y', DATE '2007-01-07' From Dual Union All
          Select 1, 'X', 'Y', DATE '2007-04-06' From Dual Union All
          Select 1, 'X', 'Y', DATE '2007-04-09' From Dual Union All
          --
          Select 1, 'X', 'Z', DATE '2007-01-27' From Dual Union All
          --
          Select 2, 'X', 'Q', DATE '2007-01-03' From Dual Union All
          Select 2, 'X', 'Q', DATE '2007-02-03' From Dual Union All
          Select 2, 'X', 'Q', DATE '2007-03-03' From Dual Union All
          Select 2, 'X', 'Q', DATE '2007-04-03' From Dual Union All
          --
          Select 2, 'Y', 'Q', DATE '2007-11-11' From Dual Union All
          Select 2, 'Y', 'Q', DATE '2007-12-12' From Dual
        )
--      M a i n    S Q L :
SELECT  ID, CODE, SINFO, TDATE, TENDDATE
FROM    ( Select    ID, CODE, SINFO, TDATE,
                  Count(*) OVER(Partition By ID, CODE, SINFO) "CNT", 
                  ROW_NUMBER() OVER(Partition By ID, CODE, SINFO Order By TDATE) "RN",
                  Case When     LEAD(TDATE, 2) Over(Partition By ID, CODE, SINFO Order By TDATE) Is Null And 
                              LEAD(TDATE, 1) Over(Partition By ID, CODE, SINFO Order By TDATE) Is Not Null
                       Then     LEAD(TDATE, 1) Over(Partition By ID, CODE, SINFO Order By TDATE) 
                  Else TDATE
                  End "TENDDATE"
          From    tbl
          Order By ID, CODE, SINFO, TDATE
        )
WHERE   CNT != RN Or (CNT = 1 And RN = 1)
/*    R e s u l t :
        ID CODE SINFO TDATE    TENDDATE
---------- ---- ----- -------- --------
         1 X    Y     07.01.07 07.01.07
         1 X    Y     06.04.07 09.04.07
         --
         1 X    Z     27.01.07 27.01.07
         --
         2 X    Q     03.01.07 03.01.07
         2 X    Q     03.02.07 03.02.07
         2 X    Q     03.03.07 03.04.07
         --
         2 Y    Q     11.11.07 12.12.07    */

NOTE: By changing Partition By clause of analytic functions and by adjusting Where clause of main SQL you could get different results and it's on you to make the logic behind this question's sample data provided and the expected result as presented....