rewrite query to remove inner query in tsql to optimize

135 Views Asked by At

I am trying to optimize the following query, According to execution plan, the sort in the inner query has a high cost. could the following query be re-written so that its easy to read and performs well?

select 
     CL.col1, CL.col2 
FROM 
     CLAIM CL WITH (NOLOCK) 
     INNER JOIN MEMBER MEM WITH (NOLOCK) ON MEM.MEMID=CL.MEMID 
     LEFT JOIN PAYVACATION PV WITH (NOLOCK) ON CL.CLAIMID = PV.CLAIMID 
         and pv.paymentid =
              (select top 1 PAYVACATION.paymentid 
                 from PAYVACATION WITH (NOLOCK), 
                      payment WITH (NOLOCK) 
               where 
                    payvoucher.claimid = cl.claimid 
                    and PAYVACATION.paymentid = payment.paymentid 
                      order by payment.paystatusdate desc)
3

There are 3 best solutions below

1
On BEST ANSWER
;WITH CTE AS
(
 select CL.col1, CL.col2, cl.claimid
  FROM CLAIM CL WITH (NOLOCK) 
  INNER JOIN MEMBER MEM WITH (NOLOCK)     ON MEM.MEMID=CL.MEMID 
  LEFT  JOIN PAYVACATION PV WITH (NOLOCK) ON CL.CLAIMID = PV.CLAIMID 
 ),
CTE2 AS 
(
select PAYVACATION.paymentid , PAYVACATION.claimid
      ,ROW_NUMBER() OVER (PARTITION BY PAYVACATION.claimid 
                     ORDER BY payment.paystatusdate desc) rn 
 from PAYVACATION WITH (NOLOCK)
 INNER JOIN payment WITH (NOLOCK) ON PAYVACATION.paymentid = payment.paymentid 
 INNER JOIN CTE WITH (NOLOCK)     ON PAYVACATION.claimid = cl.claimid 
)
SELECT CL.col1, CL.col2
FROM CTE CL 
INNER JOIN CTE2 C2 ON C2.claimid = CL.claimid
                  AND C2.rn = 1
0
On

There's a couple of things you'll need to fix before we can properly answer this question.

  1. Make sure the query works as it is. The version you've given us will not compile because of payvoucher.claimid. We can guess what it should be, but there's no use putting effort into it when it turns out to be something different.
  2. You probably run this on a Case Insensitive environment and it probably will work there, but as a rule you should try to keep your table, field, variable names 'case-consequent'. (As a .NET practitioner this should be second nature anyway =)
  3. It would help to have the table-definitions, indexes and a guesstimate on the number of records involved and if possible the way the data interacts.(lots of this connects to just couple of that etc...)
  4. Added bonus would be if you could tell us your expectations and also what other processes are on these tables and how badly our solution may affect those. (we probably can make the SELECT super-fast but at the cost of making the INSERT/UPDATE/DELETE fairly slow)

(and finally, get rid of the NOLOCK hints, or change them into their synonym READUNCOMMITTED and think about it if you still like them there as much as you do right now)

2
On

Assuming payvoucher.claimid in fact reffers to the payvacation table, you could format your query like this:

SELECT c.col1, c.col2
FROM claim c
INNER JOIN member m ON m.memid=c.memid
LEFT JOIN payvacation pv1 ON c.claimid = pv1.claimid
    AND pv1.paymentid = (
        SELECT TOP 1 pv2.paymentid
        FROM payvacation pv2
        INNER JOIN payment p ON pv2.paymentid = p.paymentid
        WHERE pv2.claimid = cl.claimid
        ORDER BY payment.paystatusdate DESC
    )

However, the entire LEFT JOIN is ignored if you do not select any column from the payvacation table. If you do select a column from the payvacation table, you will indeed get a costly Sort operator in the execution plan. To eliminate it, I would create an indexed view, like this:

CREATE VIEW indexed_view
WITH SCHEMABINDING AS
SELECT pv.paymentid, pv.claimid, p.paystatusdate
FROM dbo.payvacation pv
INNER JOIN dbo.payment p ON pv.paymentid = p.paymentid

GO
CREATE UNIQUE CLUSTERED INDEX PK_indexed_view ON indexed_view (paymentid)
CREATE INDEX i2 ON indexed_view (claimid, paystatusdate) INCLUDE (paymentid)

And then use the indexed view in the subquery, using the NOEXPAND hint:

SELECT c.col1, c.col2, pv1.paymentid
FROM claim c
INNER JOIN member m ON m.memid=c.memid
LEFT JOIN payvacation pv1 ON c.claimid = pv1.claimid
    AND pv1.paymentid = (
        SELECT TOP 1 iv.paymentid
        FROM dbo.indexed_view iv WITH (NOEXPAND)
        WHERE iv.claimid = c.claimid
        ORDER BY iv.paystatusdate DESC
    )

Using some random sample data, I get a query cost of 190.9 for the first query and a cost of 4.96 for the second query.