Let's suppose I have the following table with a clustered index on a column (say, a)
CREATE TABLE Tmp
(
a int,
constraint pk_a primary key clustered (a)
)
Then, let's assume that I have two sets of a very large number of rows to insert to the table.
- 1st set) values are sequentially increasing (i.e., {0,1,2,3,4,5,6,7,8,9,..., 999999997, 999999998, 99999999})
- 2nd set) values are sequentially decreasing (i.e., {99999999,999999998,999999997, ..., 3,2,1,0}
do you think there would be performance difference between inserting values in the first set and the second set? If so, why?
thanks
SQL Server will generally try and sort large inserts into clustered index order prior to insert anyway.
If the source for the insert is a table variable however then it will not take account of the cardinality unless the statement is recompiled after the table variable is populated. Without this it will assume the insert will only be one row.
The below script demonstrates three possible scenarios.
OPTION (RECOMPILE)
is used so SQL Server compiles a plan suited for inserting 1,000,000 rows.Execution Plans
The third one has a sort operator to get the inserted values into clustered index order first.
Verify Results and clean up
STATISTICS TIME ON
resultsFragmentation Results
Conclusion
In this case all three of them ended up using exactly the same number of pages. However
Tmp2
is 99.97% fragmented compared with only 0.4% for the other two. The insert toTmp3
took the longest as this required an additional sort step first but this one time cost needs to be set against the benefit to future scans against the table of minimal fragmentation.The reason why
Tmp2
is so heavily fragmented can be seen from the below queryWith zero logical fragmentation the page with the next highest key value would be the next highest page in the file but the pages are exactly in the opposite order of what they are supposed to be.
The rows arrived in descending order so for example values 2834 to 2536 were put into page 26818 then a new page was allocated for 2535 but this was page 26819 rather than page 26817.
One possible reason why the insert to
Tmp2
took longer thanTmp1
is because as the rows are being inserted in exactly reverse order on the page every insert toTmp2
means the slot array on the page needs to be rewritten with all previous entries moved up to make room for the new arrival.