SQL Server query to group and rank records using repeated values

49 Views Asked by At

Below is the data in my Source table:

Report_Date AccountNumber   eventtype
**1/1/2017  500000001   2
1/2/2017    500000001   2
1/3/2017    500000001   2**
**1/4/2017  500000001   1
1/5/2017    500000001   1**
**1/6/2017  500000001   2
1/8/2017    500000001   2
1/9/2017    500000001   2
1/10/2017   500000001   2
1/11/2017   500000001   2**
**1/12/2017 500000001   1
1/13/2017   500000001   1
1/15/2017   500000001   1
1/16/2017   500000001   1
1/17/2017   500000001   1**
**1/18/2017 500000001   2
1/19/2017   500000001   2
1/20/2017   500000001   2**

My sql server query output needs to be like:

AccountNumber   eventtype   StartDt EndDt
500000001   2   1/1/2017    1/3/2017
500000001   1   1/4/2017    1/5/2017
500000001   2   1/6/2017    1/11/2017
500000001   1   1/12/2017   1/17/2017
500000001   2   1/18/2017   1/20/2017

I have tried a lot using all possible combinations using RANK, ROW_NUMBER, DENSE_RANK etc. But struggling to get the desired output. Any help will be highly appreciated.

The query I'm using is:

Select a.*, 
 row_number() Over (partition by eventtype ORDER BY  aCCOUNTnUMBER,eventtype) as rank 
from
(
SELECT 
    [Report_Date],
        [AccountNumber],

          case when [DelqLevel] > 0 Then '1' 
  Else '2' End as eventtype


  FROM tab 
where 
[AccountNumber] = '500000001'
)a
Order by 2,1
3

There are 3 best solutions below

1
On BEST ANSWER

This is a gaps-and-islands problem. One solution is the difference of row numbers -- which happily works in SQL Server 2008:

select accountnumber, eventtype, min(report_date), max(report_date)
from (select t.*,
             row_number() over (partition by accountnumber order by report_date) as seqnum,
             row_number() over (partition by accountnumber, eventtype order by report_date) as seqnum_e
      from t
     ) t
group by accountnumber, eventtype, (seqnum - seqnum_e);

Why this works is a little tricky to explain. But if you look at the results of the subquery, you'll see how the difference in the two row number values identifies "adjacent" records with the same event type.

1
On

This is a typical gaps and island problem where a look back per partition is required. One way to do this is as follows:

DECLARE @T TABLE(ReportDate DATETIME, EventType INT, AccountNumber INT)
INSERT @T VALUES

    ('1/1/2017',2,2),
    ('1/2/2017',2,2),
    ('1/3/2017',2,2),
    ('1/4/2017',1,4),
    ('1/5/2017',1,5),
    ('1/6/2017',2,6),
    ('1/8/2017',2,7),
    ('1/11/2017',2,5)


;WITH Markers AS
(
    SELECT
        *,
        VirtualGroupID = SUM(IsNewGroup) OVER (ORDER BY ReportDate)
    FROM
    (        
        SELECT
            EventType, AccountNumber,
            ReportDate,
            IsNewGroup = CASE WHEN ISNULL(LAG(EventType) OVER (ORDER BY ReportDate),EventType)<>EventType THEN 1 ELSE 0 END
        FROM
            @T d
    )AS X
)

SELECT 
    EventType= MAX(EventType),
    ReportDate = MIN(ReportDate),
    MReportDate = MAX(ReportDate)
FROM 
    Markers
GROUP BY
    VirtualGroupID
0
On

I think you can try this:

SELECT
    AccountNumber,
    EventType,
    MIN(ReportDate) StartDt,
    MAX(ReportDate) EndDt

FROM (
    SELECT
        t.*,
        ISNULL(t2.ReportDate, CAST('9999-12-31 23:59:59.998' AS DATETIME)) GroupDate
    FROM tab t

    LEFT JOIN tab t2
        ON t2.AccountNumber = t.AccountNumber
            AND t2.ReportDate > t.ReportDate
            AND t2.EventType <> t.EventType

    WHERE
        t.AccountNumber = '500000001'
) t3

GROUP BY
    GroupDate, AccountNumber, EventType

Hope this could helps.