In SQL Server, I am trying to put together a single query which grabs a row and includes the aggregated data from a two hour window before that row as well as aggregated data from one hour window after. How can I make this run faster?
The rows have time stamps to a millisecond precision, and are not evenly spaced. I have over 50 million rows in this table, and the query does not seem to be completing. There are indexes in many places, but they don't seem to help. I was also thinking about using a window function, but I am not sure that its possible to have a sliding window with unevenly distributed rows. Also, for the future one hour window, I am not sure how that would be done with a SQL window.
Box is a string and has 10 unique values. Process is a string and has 30 unique values. The average duration_ms is 200 ms. Errors account for less than 0.1% of the data. The 50 million rows describes a years worth of data.
select
c1.start_time,
c1.end_time,
c1.box,
c1.process,
datediff(ms,c1.start_time,c1.end_time) as duration_ms,
datepart(dw,c1.start_time) as day_of_week,
datepart(hour,c1.start_time) as hour_of_day,
c3.*,
c5.*
from metrics_table c1
cross apply
(select
avg(cast(datediff(ms,c2.start_time,c2.end_time) as numeric)) as avg_ms,
count(1) as num_process_total,
count(distinct process) as num_process_unique,
count(distinct box) as num_box_unique
from metrics_table c2
where datediff(minute,c2.start_time,c1.start_time) <= 120
and c1.start_time> c2.start_time
and c2.error_code = 0
) c3
cross apply
(select
avg(case when datediff(ms,c4.start_time,c4.end_time)>1000 then 1.0 else 0.0 end) as percent_over_thresh
from metrics_table c4
where datediff(hour,c1.start_time,c4.start_time) <= 1
and c4.start_time> c1.start_time
and c4.error_code= 0
) c5
where
c1.error_code= 0
Edit
Version: SQL Azure 12.0
The following should be a step in the right direction... Note: c2.start_time & c4.start_time are no longer wrappen in DATEDIFF functions making them SARGable...
Of course, making a query SARGable doesn't do any good unless there's an appropriate index available. The following should be good for all 3 metrics_table references... (see what indexes are currently available, there's a chance that you may not need to create a new index)