Selecting distinct consecutive rows amongst duplicates

1.6k Views Asked by At

I have a table with the structure below:

create table roster
(
date date not null,
first nvarchar(20) not null,
second nvarchar(20) not null,
third nvarchar(20) not null,
fourth nvarchar(20) not null,
)
go

And that has the following data inserted:

insert into roster values ('2015-06-10 12:45:34', 'e', 'm', 'a', 'r')
insert into roster values ('2015-06-11 12:45:34', 'e', 'v', 'a', 'r')
insert into roster values ('2015-06-12 12:45:34', 'e', 'm', 'a', 'd') 
insert into roster values ('2015-06-13 12:45:34', 'e', 'm', 'a', 'd') *
insert into roster values ('2015-06-14 12:45:34', 'e', 'm', 'a', 'r')
insert into roster values ('2015-06-15 12:45:34', 'e', 'm', 'a', 'r') *
insert into roster values ('2015-06-16 12:45:34', 'z', 'm', 't', 'r')

Note: * marks duplicates.

How do I select only one unique consecutive combination of "first", "second", "third", and "fourth"? For example, with the above inserted data, the desired output is:

Date                 First Second Third Fourth
2015-06-10 12:45:34, e     m      a     r
2015-06-11 12:45:34, e     v      a     r
2015-06-12 12:45:34, e     m      a     d
2015-06-14 12:45:34, e     m      a     r
2015-06-16 12:45:34, z     m      t     r

I am looking for a solution that preserves entries when they are no longer consecutive (or when the sequence is broken), but removes duplicates of consecutive entries.

I have seen similar questions posted here, but I have been uable to make a solution that uses a group by.

Any help would be appreciated

3

There are 3 best solutions below

4
On BEST ANSWER

Why need a group by if you just need a single exists()?

;with data as
(
    select ROW_NUMBER() OVER (ORDER BY date) AS number, * from roster
)
select * from data where 
    not exists -- Just compare with the previous column, if match say bye
    (
        select * from data prev where 1 = 1
            and prev.first = data.first 
            and prev.second = data.second 
            and prev.third = data.third 
            and prev.fourth = data.fourth 
            and prev.number + 1 = data.number
    )

SQL Fiddle

EDIT

;with data as
(
    select
        ROW_NUMBER() OVER (ORDER BY date) AS number, 
        ROW_NUMBER() OVER (PARTITION BY first, second, third, fourth ORDER BY date) AS part,
        *
    from roster
)
select MIN(date) as startdate, MAX(date) as enddate, COUNT(*) count, first, second, third, fourth
from data group by first, second, third, fourth, number - part
order by number - part

SQL Fiddle

3
On

You can group by the values of first, second, third, fourth then select the first date those values are encountered with min(date) or the last time they occur with max(date)

example for the last date encountered: fiddle

SELECT min(date) as startdate ,max(date) as enddate,  first, second, third, fourth 
from roster
GROUP BY first, second, third, fourth

EDIT: edited previous query to include start and enddate

EXTRA: something I was playing with when waiting for your reply: including a list of dates where the values occured in 1 field:

SELECT first, second, third, fourth,
STUFF((
          SELECT ',' + convert(varchar(25),T.date)
          FROM roster T
          WHERE A.first = T.first
          AND A.second = T.second
          AND A.third = T.third
          AND A.fourth = T.fourth
          ORDER BY T.date
          FOR XML PATH('')), 1, 1, '') as dates

from roster A
GROUP BY first, second, third, fourth

EDIT: I got pretty close to what you wanted but not quite, however I have no idea how to get it closer, I guess this is as far as I go, the rest is up to someone else :D : SQLFIDDLE

SELECT b.date as startdate, a.date as enddate, a.first, a.second, a.third, a.fourth FROM
(Select ROW_NUMBER() 
        OVER (ORDER BY first, second, third, fourth,date ) AS Row,
        date,
        first,second,third,fourth
from roster) A
JOIN 
(Select ROW_NUMBER() 
        OVER (ORDER BY first, second, third, fourth,date ) AS Row,
        date,
        first,second,third,fourth
from roster) B
ON A.row = b.row + 1
WHERE a.first = b.first
and a.second = b.second
and a.third = b.third
and a.fourth = b.fourth
UNION
select max(date) as startdate, null as enddate, first, second, third, fourth
FROM roster
group by first, second, third, fourth
having count(*) = 1;
0
On

I was looking for my own way of doing this, and figured out that you can also do it with window functions:

SELECT *,
  ARRAY_AGG(STRUCT(first, second, third, fourth)) OVER (
    ORDER BY date
    ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING 
  )[OFFSET(0)] prior_row
FROM
  roster r 
QUALIFY
  prior_row IS NULL
  OR prior_row <> STRUCT(first, second, third, fourth)