This query references ~3M records and now takes an absolute age to run. The data is taken from an Excel Spreadsheet with Cust/Invoices down rows and monthly values 2016 to current in columns.
This query checks to see if there is a value in the same month for same/different products and outputs 1 if it can be ignored, 0 if it should be considered for subsequent queries.
I've set an index for product, which takes care of the initial criteria, but it is the sub-query that absolutely kills this:
UPDATE tbl_transactions a
SET ProdInCust_Mnth_Same_SameProd_LowerVal =
CASE WHEN
(
SELECT COUNT(TransactionID)
FROM tbl_transactions_tmp b
WHERE
b.TransactionID<>a.TransactionID AND
b.CustomerRef=a.CustomerRef AND
b.TransMonth=a.TransMonth AND
(
(
(b.Product='PLATINUM') AND
b.TransValue<0
)
OR
(
a.TransValue=0 AND
(b.Product='PLATINUM' OR b.Product='GOLD' OR b.Product='SILVER') AND
b.TransValue<0
)
OR
(
a.TransValue<0 AND
(b.Product='PLATINUM' OR b.Product='GOLD') AND
((b.TransValue=a.TransValue AND b.RowReference>a.RowReference) OR
b.TransValue<a.TransValue
)
)
)
)>0 THEN 1 ELSE 0 END
WHERE Product='GOLD';
EXPLAIN produces:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 UPDATE a \N index IDX_tbl_transactions_Product PRIMARY 8 \N 2828152 100 Using where
2 DEPENDENT SUBQUERY b \N ref IX_Transactions_SP_ProcessTransAA IX_Transactions_SP_ProcessTransAA 45 finance.a.CustomerRef,finance.a.TransMonth 1 20.7 Using where; Using index
Visually, it says it's a FULL INDEX SCAN, I'm presuming the red background suggests this is bad.
Any ideas how I can optimise this further.
Try to add indexes for all fields used in WHERE clause of the subquery to both tables and use
EXISTS
instead ofCOUNT
:ref: Is EXISTS more efficient than COUNT(*)>0?