I recently encountered an index in a database I maintain that was of the form:
CREATE INDEX [IX_Foo] ON [Foo]
( Id ASC )
INCLUDE
( SubId )
In this particular case, the performance problem that I was encountering (a slow SELECT filtering on both Id and SubId) could be fixed by simply moving the SubId column into the index proper rather than as an included column.
This got me thinking however that I don't understand the reasoning behind included columns at all, when generally, they could simply be a part of the index itself. Even if I don't particularly care about the items being in the index itself is there any downside to having column in the index rather than simply being included.
After some research, I am aware that there are a number of restrictions on what can go into an indexed column (maximum width of the index, and some column types that can't be indexed like 'image'). In these cases I can see that you would be forced to include the column in the index page data.
The only thing I can think of is that if there are updates on SubId, the row will not need to be relocated if the column is included (though the value in the index would need to be changed). Is there something else that I'm missing?
I'm considering going through the other indexes in the database and shifting included columns in the index proper where possible. Would this be a mistake?
I'm primarily interested in MS SQL Server, but information on other DB engines is welcome also.
The answers so far are all correct and all - but they might not convey enough what you gain from a covering index.
In your case, you have a table
Foo
and some fields, including anId
(which I assume is the primary key), and aSubId
which is some additional ID of some kind.You also have an index
IX_Foo
which I assume had onlyId
in it for now.So now you need to find the
SubId
forId=4
.IX_Foo
Id=4
in your indexIX_Foo
SubId
, tooIX_Foo
will contain the clustering key valueSubId
from itThe main point here is: once SQL Server has found your
Id=4
in theIX_Foo
index, it will then need to do another I/O operation, a bookmark lookup, to go fetch the whole data row, in order to be able to find theSubId
value.If you have a covering index, e.g.
IX_Foo
also includesSubId
, that extra I/O to do the bookmark lookup is eliminated. Once the valueId=4
is found in theIX_Foo
index, that index page in your non-clustered index will also include the value ofSubId
- SQL Server can now return those two values you asked for in your SELECT query without having to do an extra (potentially expensive and thus slow) bookmark lookup just to go fetch another Id column.That's the main benefit of covering indices - if you only need one or two extra columns, besides the index values you're doing the lookup on, by including those values into the index itself, you can save yourself a lot of bookmark lookups and thus speed things up significantly. You should however only include very few, and small bits of information - don't duplicate your entire data rows into all non-clustered indices! That's not the point.
UPDATE: the trade-off is this: if you have an index on (Id, SubId), all the pages in the index have both columns - the whole index tree through.
If you INCLUDE(SubId), the SubId fields are only present at the leaf level.
This means