left join tables which have same foreign key: hash key probe takes 50%

476 Views Asked by At

We have 3 tables:

  • Table0, containing the ID (= primary key)
  • Table1, contains amongst others the nullable FK towards the ID from table0
  • Table2, contains amongst others the nullable FK towards the ID from table0

Our query runs way too slow, even when it's indexed properly. When looking at the execution plan (SQL Server 2014), he looses a lot of time on that left outer join. SQL server uses a "Hash Match" instead, making it an inner join which costs 47% (or 50% if I don't explicitly set the [FI].[pId] = [FPF].[PId] in the where clause).

execution plan

The explaination states that he uses a "hash key probe" towards the [FI].[pId].

SELECT [FI].[ID], [FI].[Name], [FI].[Data]
FROM [dbo].[Table1] AS [FI] WITH (NOLOCK)
LEFT JOIN [Table2] AS [FPF] WITH (NOLOCK) ON [FI].[pId] = [FPF].[pId]
WHERE
[FI].[pId] = [FPF].[PId] AND -- If I add this explicitly, the query is already a lot faster
(
(
    [FI].[tId] = @tID --is bigint (FK)
    AND
    [Fi].[Name] = @Name --is varchar
)
OR
(
    [FI].[fiType] = 1
)
OR
(
    [FPF].[tId] = @tID
    AND
    [FPF].[Name] = @Name
))
ORDER BY [Fi].[Data]

I've even tried to link the table0 with the primary key too, but it makes no difference. Also using outer apply gives the same results. I've been playing around with indexes too on both tables, but without any profit.

Could someone share some thoughts on what I might be doing wrong here?

2

There are 2 best solutions below

1
On BEST ANSWER

Try and Minimize the possible number of records that is being Returned on the LEFT JOIN.

You're only ever interested in records from [Table2] with specific @name and @tID, so limit the size of the [Table2] result-set at the join point.

SELECT [FI].[ID], [FI].[Name], [FI].[Data]
FROM [dbo].[Table1] AS [FI] WITH (NOLOCK)
LEFT JOIN [Table2] AS [FPF] WITH (NOLOCK) ON [FI].[pId] = [FPF].[pId]
                                            AND [FPF].[tId] = @tID
                                            AND [FPF].[Name] = @Name
WHERE
[FI].[pId] = [FPF].[PId] AND -- If I add this explicitly, the query is already a lot faster
(
(
    [FI].[tId] = @tID --is bigint (FK)
    AND
    [Fi].[Name] = @Name --is varchar
)
OR
(
    [FI].[fiType] = 1
)
OR
(
    [FPF].[tId] = @tID
    AND
    [FPF].[Name] = @Name
))
ORDER BY [Fi].[Data]

On [Table2] use this index:

CREATE NONCLUSTERED INDEX idx ON [Table2](pID) INCLUDE (tId,Name)

Your extra bit of code can turn your query into an INNER JOIN. Which will be faster.

[FI].[pId] = [FPF].[PId] AND -- If I add this explicitly, the query is already a lot faster

Do you really need the ORDER BY? If not, then get rid of it.

0
On

Solved by adding data in the proposed index, and add it before the pid. Inner join made it a lot worse though