Fastest way to query latest version in large SQL Server table?

153 Views Asked by At

What is the fastest way to query for "latest version" on a large SQL table using an update timestamp in SQL Server?

I'm currently using the inner join approach, on very large SQL Server weather forecast table by Date, City, Hour, Temperature, UpdateTimestamp. To get the latest temperature forecast, I created a view using inner join on Date, City, and Hour + max(UpdateTimestamp), such as in this other posting.

However as the dataset is growing on the original table, the view query is getting slower and slower over time.

Wonder if others have encountered similar situation, and what's the best way to speed up this query (one alternative solution I'm considering is having a stored procedure run each day creating a separate table of the "latest version" only, which then will be very quick to access).

EDIT 4/4 - I've found the best solution so far (thanks Vikram) was to add a clustered index to my table on 3 fields "TSUnix", "CityId", "DTUnix", which sped up performance by ~4x (from 25 seconds to 4 seconds)

Also I've tried to use row_number solution (below query sample) , although appears bit slower than the "inner join" approach. Both queries + index creation are below :

Index Creation:

USE [<My DB>]
GO
CREATE NONCLUSTERED INDEX [index_WeatherForecastData]
ON [dbo].[<WeatherForecastData>] ([TSUnix], [CityId], [DTUnix])
INCLUDE ([Temperature], [TemperatureMin], [TemperatureMax], [Humidity], [WindSpeed], [Rain], [Snow])
GO

Query:

-- Inner Join Version

SELECT W.TSUnix, W.CityId, W.DTUnix, W.Temperature, W.*

FROM WeatherForecastData W

INNER JOIN (
    SELECT max(TSUnix) Latest, CityId, DTUnix 
    FROM WeatherForecastData 
    GROUP BY CityId, DTUnix
    ) L
    ON L.Latest = W.TSUnix
    AND L.CityID = W.CityID
    AND L.DTUnix = W.DTUnix

-- Row Number Version

SELECT W.TSUnix, W.CityId, W.DTUnix, W.Temperature, W.*

FROM 
    (
    select 
        *, ROW_NUMBER() over (partition by DTUnix, CityId order by TSUnix desc) as RowNumber
    from WeatherForecastData
    ) W

WHERE
    W.RowNumber = 1

Thanks!

1

There are 1 best solutions below

3
RnP On

Use ROW_NUMBER with an index as shown below.

The specific index that will make this fast is an index that has Date, City, Hour and UpdateTimestamp descending. This requires a single pass over the table rather than multiple passes an INNER JOIN would likely require.

Working code: http://sqlfiddle.com/#!18/8c0b4/1

SELECT Date, City, Hour, Temperature 
FROM
    (SELECT 
         Date, City, Hour, Temperature,
         ROW_NUMBER() OVER(PARTITION BY Date, City, Hour
                           ORDER BY UpdateTimestamp DESC) AS RowNumber
     FROM
         Test) AS t  
WHERE 
    t.RowNumber = 1