islands and gaps ordering issue MYSQL 8.0

125 Views Asked by At

I am trying to use partition by & row_number() to count consecutive duplicate values for a given date range.Essentially its attempting to capture "streaks" If there is a break in the streak the count should start over when the value occurs again.

To reproduce these results here is the code:

    CREATE TABLE partion_test (
    daily DATE,
    response_short_name VARCHAR(10)
    
);

INSERT INTO `partion_test` (`daily`, `response_short_name`) VALUES
('2020-09-21', 'A'),
('2020-09-25', 'A'),
('2020-09-26', 'A'),
('2020-09-27', 'A'),
('2020-09-28', 'A'),
('2020-09-22', 'B'),
('2020-09-20', 'C'),
('2020-09-23', 'C'),
('2020-09-24', 'C');





SELECT 
daily,
response_short_name
 ,row_number() over (partition by response_short_name order by daily) as seqnum 
FROM (

select
daily,
response_short_name

 FROM partion_test  
order by daily  limit 1000
) A;

HERE IS THE CURRENT OUTPUT

|   daily    | response_short_name | seqnum |  |
+------------+---------------------+--------+--+
| 2020-09-21 | A                   |      1 |  |
| 2020-09-25 | A                   |      2 |  |
| 2020-09-26 | A                   |      3 |  |
| 2020-09-27 | A                   |      4 |  |
| 2020-09-28 | A                   |      5 |  |
| 2020-09-22 | B                   |      1 |  |
| 2020-09-20 | C                   |      1 |  |
| 2020-09-23 | C                   |      2 |  |
| 2020-09-24 | C                   |      3 |  |
+------------+---------------------+--------+--+

HERE IS THE DESIRED OUTPOUT

 +------------+---------------------+--------+--+
|   daily    | response_short_name | seqnum |  |
+------------+---------------------+--------+--+
| 2020-09-20 | C                   |      1 |  |
| 2020-09-21 | A                   |      1 |  |
| 2020-09-22 | B                   |      1 |  |
| 2020-09-23 | C                   |      1 |  |
| 2020-09-24 | C                   |      2 |  |
| 2020-09-25 | A                   |      1 |  |
| 2020-09-26 | A                   |      2 |  |
| 2020-09-27 | A                   |      3 |  |
| 2020-09-28 | A                   |      4 |  |
+------------+---------------------+--------+--+

Ive been scratching at my brain for a while on this. Any help would be appreciated

2

There are 2 best solutions below

0
nbk On

Your data doesn't fit your result, so it is quite diffcult t achieve your result

    CREATE TABLE partion_test (
    daily DATE,
    response_short_name VARCHAR(10)
    
);

INSERT INTO `partion_test` (`daily`, `response_short_name`) VALUES
('2020-09-21', 'A'),
('2020-09-25', 'A'),
('2020-09-26', 'A'),
('2020-09-27', 'A'),
('2020-09-28', 'A'),
('2020-09-22', 'B'),
('2020-09-20', 'C'),
('2020-09-23', 'C'),
('2020-09-24', 'C');
select `daily`,`response_short_name`,
       row_number() over (partition by `response_short_name`, grp order by `daily`) as row_num
from (select t.*,
             (row_number() over (order by `daily`) -
              row_number() over (partition by `response_short_name` order by `daily`)
             ) as grp
      from partion_test t
     ) t
     ORDER BY `daily`
daily      | response_short_name | row_num
:--------- | :------------------ | ------:
2020-09-20 | C                   |       1
2020-09-21 | A                   |       1
2020-09-22 | B                   |       1
2020-09-23 | C                   |       1
2020-09-24 | C                   |       2
2020-09-25 | A                   |       1
2020-09-26 | A                   |       2
2020-09-27 | A                   |       3
2020-09-28 | A                   |       4

db<>fiddle here

0
The Impaler On

You can do:

select *,
  row_number() over(partition by grp order by daily) as seqnum  
from (
  select *,
    sum(inc) over(order by daily) as grp
  from (
    select *,
      case when lag(response_short_name) over(order by daily) = response_short_name
        then 0 else 1 end as inc
    from partion_test
    order by daily
  ) x
) y
order by daily

Result:

 daily       response_short_name  inc  grp  seqnum 
 ----------- -------------------- ---- ---- ------ 
 2020-09-20  C                    1    1    1      
 2020-09-21  A                    1    2    1      
 2020-09-22  B                    1    3    1      
 2020-09-23  C                    1    4    1      
 2020-09-24  C                    0    4    2      
 2020-09-25  A                    1    5    1      
 2020-09-26  A                    0    5    2      
 2020-09-27  A                    0    5    3      
 2020-09-28  A                    0    5    4      

See running example at DB Fiddle: