There are two tables, Costs
and Logs
. The data in Costs
table can be in the millions of rows, and in Logs
table it can be billions of rows.
I need to update the CostBy
column in Costs
table in a service task in production environment within 100 records each run.
CREATE TABLE Cost
(
C_PK uniqueidentifier primary key not null,
C_CostBy varchar(3) not null
)
CREATE TABLE Logs
(
L_PK uniqueidentifier primary key not null,
L_ParentTable varchar(255) not null, -- Table Cost and other table's name
L_ParentID uniqueidentifier not null, -- Cost's pk and other table's pk
L_Event varchar(3) not null, -- Part are 'ADD' and other event types
L_User varchar(3) not null
)
CREATE NONCLUSTERED INDEX [L_ParentID]
ON [dbo].[Costs] ([L_ParentID] ASC)
Here is the original update statement:
UPDATE TOP(100) Costs
SET CostBy = ISNULL(L_User, '~UK')
FROM Costs
LEFT JOIN Logs ON L_ParentID = C_PK AND L_Event = 'ADD'
WHERE CostBy = ''
However, the statement introducing a massive performance issue, high cost of table scan in Costs
table.
My question is how to avoid the table scan in Costs
table or how to optimize the update statement?
Thanks in advance.
You may want to try the following.
First, create an index on Logs, including all the relevant columns:
If this is a unique index, ie. only a single row will ever exist with ADD event for a given parent ID, make sure to make this a unique index as it can dramatically improve performance.
Second, and this is a hit and miss situation, you may try with an index on Costs (CostBy) because you're only looking for empty CostBy values to update. This index will need to be updated upon your query because it's updating it, so it may slow down your query instead of speeding it up. It depends on a number of factors.
If you have an enterprise license, try both with with
WITH (DATA_COMPRESSION = PAGE)
, it can significantly improve IO time at the expense of CPU. It depends which is your bottleneck.Additionally, depending on the nature of your data, updating statistics may improve your queries. If there is a disproportionate number of rows with CostBy = '' to other values in there, you may benefit from full statistics on that field. Consider
NORECOMPUTE
if you only need them for this specific query, this one time.