Distance-based JOIN given Latitude/Longitude

3.8k Views Asked by Marsellus Wallace At 20 January 2012 at 21:20

Given the following tables:

table A (id, latitude, longitude)
table B (id, latitude, longitude)

how do I build an efficient T-SQL query that associates each row in A with the closest row in B?

The ResultSet should contains all the rows in A and associate them with 1 and only 1 element in B. The format that I'm looking for is the following:

(A.id, B.id, distanceAB)

I have a function that calculates the distance given 2 pairs of latitude and longitude. I tried something using order by ... limit 1 and/or rank() over (partition by ...) as rowCount ... where rowCount = 1 but the result is either not really what I need or it takes too long to return.

Am I missing something?

Original Q&A

There are 3 best solutions below

Sparky On 20 January 2012 at 23:23

This is one approach that should have deceent performance, but a big caveat is that it might not find any results

    select top 1 a.id,b.id,dbo.yourFunction() as DistanceAB
    from a 
    join b on b.latitude between a.latitude-10 and a.latitude+10 and
              b.longititude between a.longitude-10 and b.longittude+10
    order by 3

What you are basically doing is looking for any B row within roughly a 20 unit radius of A and then sorting it by your function to determine the closest. You can adjust the unit radius as needed. While it is not exact, it should reduce the size of the result set and should give you decent performance results.

tpolyak On 20 January 2012 at 23:43

It's possible with the join of two subqueries. The first contains all distances between A and B locations, the second contains only the minimum distance of B locations from A locations.

SELECT x.aid, x.bid, x.distance
FROM
(SELECT A.ID AS aid, 
        B.ID AS bid, 
        SQRT(A.Latitude * A.Latitude + B.Longitude * B.Longitude) AS Distance
     FROM LocationsA AS A 
     CROSS JOIN LocationsB AS B) x JOIN
(SELECT A.ID AS aid, 
        MIN(SQRT(A.Latitude * A.Latitude + B.Longitude * B.Longitude)) AS Distance
     FROM LocationsA AS A 
     CROSS JOIN LocationsB AS B
     GROUP BY A.ID) y ON x.aid = y.aid AND x.Distance = y.Distance

Chad On 21 January 2012 at 02:14

There's no way to get around the fact that you're going to have to compare every record in A with every record in B, which is obviously going to scale poorly if both A and B contain a lot of records.

That being said, this will return correct results:

SELECT aid, bid, distanceAB
FROM (
  SELECT aid, bid, distanceAB,
    dense_rank() over (partition by aid order by distanceAB) as n
  FROM (
    SELECT a.id as aid, B.id as bid,
      acos(sin(radians(A.lat)) * sin(radians(B.lat)) +
        cos(radians(A.lat)) * cos(radians(B.lat)) *
        cos(radians(A.lon - B.lon))) * 6372.8 as distanceAB
    FROM A cross join B
  ) C
) D
WHERE n = 1

This will return in a reasonable amount of time if your sets aren't too large. With 3 locations in A and 130,000 or so in B, it takes about one second on my machine. 1,000 records in each takes about 40s. Like I said, it scales poorly.

It should be noted that Sparky's answer can return incorrect results under certain circumstances. Suppose your A location is at +40,+100. +40,+111 would not be returned, even though it's closer than +49,+109.

Distance-based JOIN given Latitude/Longitude

There are 3 best solutions below

Related Questions in SQL

Related Questions in T-SQL

Related Questions in JOIN

Related Questions in DISTANCE

Related Questions in HAVERSINE

Trending Questions

Popular # Hahtags

Popular Questions