sql: how to find consecutive dates where condition is met in aggregate

93 Views Asked by At

I am working in Azure SQL MI. I am struggling with a certain SQL query. I have a "timesheet" table like the following:

EmployeeID    DateWorked    LaborType    Hours
----------------------------------------------
1             2024-01-01    work         13
1             2024-01-02    work         12
1             2024-01-03    no work      24
1             2024-01-04    work         8
2             2024-01-01    no work      24
2             2024-01-02    work         11
2             2024-01-03    work         8
2             2024-01-04    work         13

Every employee will have 1 and only 1 row for every date. The "no work" hours will always be 24 hours.

We have a rule where an employee is not allowed to work more than 24 hours across 2 consecutive days. I need to find all instances of this rule violation. For example, employee 1 violates the rule on dates 2024-01-01 and 2024-01-02 because he worked 25 (13 + 12) hours. He does not violate the rule on dates 2024-01-02 and 2024-01-03 because the 2024-01-03 row is "no work". Similarly, employee 2 has no rule violations.

So, I expect output like the following:

EmployeeID     ViolationDates            TotalHoursWorked
-------------------------------------------------------
1              2024-01-01, 2024-01-02        25

How can I write a query to return this output?

If it helps, below is some T-SQL for the sample data.

CREATE TABLE dbo.timesheet_mockup
    (
        EmployeeID int
        ,DateWorked date
        ,LaborType varchar(7)
        ,Hours int
    )
    ;

INSERT INTO
    dbo.timesheet_mockup
VALUES
    (1, '2024-01-01', 'work', 13)
    ,(1, '2024-01-02', 'work', 12)
    ,(1, '2024-01-03', 'no work', 24)
    ,(1, '2024-01-04', 'work', 8)
    ,(2, '2024-01-01', 'no work', 24)
    ,(2, '2024-01-02', 'work', 11)
    ,(2, '2024-01-03', 'work', 8)
    ,(2, '2024-01-04', 'work', 13)
;
1

There are 1 best solutions below

7
shawnt00 On BEST ANSWER
select *, t1.Hours + t2.Hours as TotalHoursWorked
from dbo.timesheet_mockup t1 inner join dbo.timesheet_mockup t2
    on t2.EmployeeID = t1.EmployeeID
        and t2.DateWorked = dateadd(day, -1, t1.DateWorked)
        and t1.LaborType = 'work' and t2.LaborType = 'work'
where t1.Hours + t2.Hours > 24;

You could generalize this across multiple rows very readily with a cross apply:

select *
from dbo.timesheet_mockup t1 cross apply (
    select sum(Hours) as TotalHoursWorked
    from dbo.timesheet_mockup t2
    where   t2.EmployeeID = t1.EmployeeID
        and t2.DateWorked between dateadd(day, -DAYS, t1.DateWorked) and t1.DateWorked
        and t2.LaborType = 'work'
) as lookback
where t1.LaborType = 'work' and lookback.TotalHoursWorked > MAXHOURS;

sum() over (... range between) would work except that SQL Server doesn't support this window with range. You could still filter the non-work days in the summation but it does assume there are no gaps in the data:

with data as (
    select *,
        sum(case when LaborType = 'work' then Hours else 0 end) over (
            partition by EmployeeID order by DateWorked
            rows between 1 preceding and current row) as TotalHoursWorked
    from dbo.timesheet_mockup
)
select * from data where TotalHoursWorked > 24;

Eventually this range query will work on SQL Server too: https://dbfiddle.uk/W8boAC2t (Postgres example)

And then there's always the simple option to compare lead()/lag() across adjacent rows and all the relevant columns. (Left as an exercise.)

To guarantee that worked dates are consecutive you could gather the number of non-work hours and then ensure the total was zero:

select EmployeeID, DateWorked as LastDateWorked, TotalHoursWorked
from dbo.timesheet_mockup t1 cross apply (
    select
        sum(case when t2.LaborType = 'work'    then Hours else 0 end) as TotalHoursWorked,
        sum(case when t2.LaborType = 'no work' then Hours else 0 end) as TotalHoursOff
    from dbo.timesheet_mockup t2
    where   t2.EmployeeID = t1.EmployeeID
        and t2.DateWorked between dateadd(day, -DAYS, t1.DateWorked) and t1.DateWorked
) as lookback
where t1.LaborType = 'work' and lookback.TotalHoursWorked > MAXHOURS and TotalHoursOff = 0
order by EmployeeID, LastDateWorked;

Or by only looking at consecutive dates in the first place (which also eliminates the concern about gaps in the data):

with consecutive_dates as (
   select *,
       datediff(day, cast('20000101' as date), DateWorked)
           - row_number() over (partition by EmployeeID order by DateWorked) as grp
   from dbo.timesheet_mockup
   where LaborType = 'work'
), agg as (
    select *,
        count(*) over (
            partition by EmployeeID, grp order by DateWorked
            rows between DAYS preceding and current row) as NumConsecutiveDates,
        sum(Hours) over (
            partition by EmployeeID, grp order by DateWorked
            rows between DAYS preceding and current row) as TotalHoursWorked
    from consecutive_dates
)
select EmployeeID, DateWorked as LastDateWorked, NumConsecutiveDates, TotalHoursWorked
from agg where TotalHoursWorked > MAXHOURS;