Using MSSQL 2022.
Source temp table has 156,289 rows. Destination table has zero rows. Use case is to insert rows from temp->destination where rows do not already exist in destination. 4 columns determine if a row is unique.
Using Merge When Not Matched by Target, the first time code is run it takes less than 10 seconds to complete, inserting 156,289. The second time it is run, it takes ~8 minutes to complete, inserting zero rows. The behavior is correct but it takes a very long time. https://www.brentozar.com/pastetheplan/?id=H7Cmfrd2n shows a 440mb memory grant and estimated rows 358,028 even though zero are written the second time.
Using Insert Where Not Exists, the the first time code is run it takes less than 10 seconds to complete, inserting 156,289. The second time it is run, it again takes a few seconds to complete, inserting zero rows. The behavior is correct and performs acceptably. https://www.brentozar.com/pastetheplan/?id=SyQ8fdu27 shows no memory grant and estimates 156,289 rows even though zero are written the second time.
Create Table #InputInterimTags ( AspectKey VarChar(400) , AspectValue VarChar(400) , InputInterimId Int , Tpid Int )
Create Clustered Index #InputInterimTags_Pk On #InputInterimTags (Tpid,AspectKey,AspectValue)
... temp table is populated...
Update Statistics #InputInterimTags With Fullscan
Create Table dbo.Aspects
(
AspectsId Int Identity(9,1) Primary Key
, RowStampUtc Datetime2(7)
, DictAspectsId Int
, AspectKey VarChar(400)
, AspectValue VarChar(400)
, DictAspectsAssociationStatesId Int
, Tpid BigInt
)
Create Index Aspects_Ix01 On dbo.Aspects (Tpid,AspectKey,AspectValue) Include (DictAspectsId)
Have tried different combinations of composite and single indexes on Aspects table, collation, additional filtering.
Thanks.
There are two issues here:
MERGEis unnecessary. As you have noted, you can just useINSERT...WHERE NOT EXISTS. But you don't need to, and should not, re-join#InputInterimTagsin theNOT EXISTSsubquery.UNIQUE, you are getting a many-to-many Merge Join in the plan, which is rather inefficient, and makes the server want to add a Spool as well.EXISTSsubquery does not need to select anything, it can justSELECT 1.You may want to consider making
Aspects_Ix01 On dbo.Aspectsa clustered index, and its primary key non-clustered, depending on your use case.Then your insert should be like this.
Consider using
TABLOCKon the inserted table also, assuming you are not worried about blocking anyone else.