Subquery slowing down update

42 Views Asked by At

This query references ~3M records and now takes an absolute age to run. The data is taken from an Excel Spreadsheet with Cust/Invoices down rows and monthly values 2016 to current in columns.

This query checks to see if there is a value in the same month for same/different products and outputs 1 if it can be ignored, 0 if it should be considered for subsequent queries.

I've set an index for product, which takes care of the initial criteria, but it is the sub-query that absolutely kills this:

    UPDATE tbl_transactions a
    SET ProdInCust_Mnth_Same_SameProd_LowerVal =
        CASE WHEN
                (
                    SELECT COUNT(TransactionID) 
                    FROM tbl_transactions_tmp b 
                    WHERE 
                        b.TransactionID<>a.TransactionID AND
                        b.CustomerRef=a.CustomerRef AND
                        b.TransMonth=a.TransMonth AND
                        (
                            (
                                (b.Product='PLATINUM') AND
                                b.TransValue<0
                            )
                            OR                                  
                            (
                                a.TransValue=0 AND
                                (b.Product='PLATINUM' OR b.Product='GOLD' OR b.Product='SILVER') AND
                                b.TransValue<0
                            )
                            OR
                            (
                                a.TransValue<0 AND
                                (b.Product='PLATINUM' OR b.Product='GOLD') AND
                                ((b.TransValue=a.TransValue AND b.RowReference>a.RowReference) OR
                                    b.TransValue<a.TransValue
                                )
                            )
                        )
                )>0 THEN 1 ELSE 0 END   
    WHERE Product='GOLD';

EXPLAIN produces:

id  select_type table   partitions  type    possible_keys   key key_len ref rows    filtered    Extra
1   UPDATE  a   \N  index   IDX_tbl_transactions_Product    PRIMARY 8   \N  2828152 100 Using where
2   DEPENDENT SUBQUERY  b   \N  ref IX_Transactions_SP_ProcessTransAA   IX_Transactions_SP_ProcessTransAA   45  finance.a.CustomerRef,finance.a.TransMonth  1   20.7    Using where; Using index

Visually, it says it's a FULL INDEX SCAN, I'm presuming the red background suggests this is bad.

Any ideas how I can optimise this further.

3

There are 3 best solutions below

0
On

Try to add indexes for all fields used in WHERE clause of the subquery to both tables and use EXISTS instead of COUNT:

UPDATE tbl_transactions a
SET ProdInCust_Mnth_Same_SameProd_LowerVal =
    CASE WHEN EXISTS
            (
                SELECT TransactionID
                FROM tbl_transactions_tmp b 
                WHERE 
                    b.TransactionID<>a.TransactionID AND
                    b.CustomerRef=a.CustomerRef AND
                    b.TransMonth=a.TransMonth AND
                    (
                        (
                            (b.Product='PLATINUM') AND
                            b.TransValue<0
                        )
                        OR                                  
                        (
                            a.TransValue=0 AND
                            (b.Product='PLATINUM' OR b.Product='GOLD' OR b.Product='SILVER') AND
                            b.TransValue<0
                        )
                        OR
                        (
                            a.TransValue<0 AND
                            (b.Product='PLATINUM' OR b.Product='GOLD') AND
                            ((b.TransValue=a.TransValue AND b.RowReference>a.RowReference) OR
                                b.TransValue<a.TransValue
                            )
                        )
                    )
            ) THEN 1 ELSE 0 END   
WHERE Product='GOLD';

ref: Is EXISTS more efficient than COUNT(*)>0?

0
On

Do you have this composite index? Or a wider index starting with these two columns?

INDEX(CustomerRef, TransMonth)   -- in either order

Would it be possible to compute the information, or part of the info, once a month for the previous month? Storing this in a Summary Table, then querying that table may be a much faster way to go.

Do you happen to know whether you are I/O-bound or not? If you are I/O-bound, what is the value of innodb_buffer_pool_size, how much RAM do you have, and how big (GB) is the table?

0
On

Can you try something like below- Use CTE or temp table and evaluate the expected result using case and where. Use this table value while updating put appropriate where clasus. Hope this may help you in creating query. Query might not give exact result but can help you in creating query.

UPDATE  a
    SET ProdInCust_Mnth_Same_SameProd_LowerVal = c.val
    tbl_transactions a
    JOIN cte c on a.TransactionID = c.TransactionID        
    --WHERE Product='GOLD';

WITH cte AS
(
SELECT b.TransactionID,  b.CustomerRef,b.TransMonth,b.TransValue, COUNT(TransactionID) ,
    case when  COUNT(TransactionID) > 0 then 1 else 0 END as val
                    FROM tbl_transactions_tmp b 
                    WHERE 
                       -- b.TransactionID<>a.TransactionID AND
                       -- b.CustomerRef=a.CustomerRef AND
                        b.TransMonth=a.TransMonth AND
                        (
                            (
                                (b.Product='PLATINUM') AND  b.TransValue<0
                            )
                            OR                                  
                            (
                                b.TransValue=0 AND
                                (b.Product='PLATINUM' OR b.Product='GOLD' OR b.Product='SILVER') AND
                                b.TransValue<0
                            )
                            OR
                            (
                                b.TransValue<0 AND
                                (b.Product='PLATINUM' OR b.Product='GOLD') AND
                                ((b.TransValue=a.TransValue AND b.RowReference>a.RowReference) OR
                                    b.TransValue<a.TransValue
                                )
                            )
                        )
                )
                group by b.CustomerRef
)