Clustered index key should be part of the non-clustered index key, but how can I see it?

287 Views Asked by At

In SQL Server, when a table has clustered index defined on it, the clustered index key will be implicitly added as a "hidden" key to any non-clustered indexes. But how come this "hidden" column doesn't show up in meta-data queries?

For example, if I define the following simple table:

USE [tempdb];

DROP TABLE IF EXISTS abc;
CREATE TABLE abc (col1 INT NOT NULL, col2 INT NOT NULL, col3 INT);

ALTER TABLE [dbo].[abc] ADD CONSTRAINT PK_abc PRIMARY KEY CLUSTERED ([col1]);

CREATE NONCLUSTERED INDEX IX_2 ON [dbo].[abc] ([col3])

And then run a query to view index information:

SELECT object_schema_name(t.object_ID)+'.'+t.name AS The_Table, 
       i.[name] AS The_Index,  
       i.[type_desc],
       index_column_id,
       col_name(Ic.Object_Id, Ic.Column_Id) AS The_Column --the column
FROM sys.tables t
INNER JOIN sys.indexes i
    ON t.object_ID=i.object_ID
INNER JOIN sys.Index_columns  ic
    ON i.Object_ID=ic.Object_ID
    AND i.index_ID=ic.index_ID
WHERE i.object_id = OBJECT_ID('dbo.abc') --<== filter by table if necessary
ORDER BY t.name,i.index_id, index_column_id;

I get this:


The_Table    The_Index    type_desc        index_column_id The_Column
------------ ------------ ---------------- --------------- -------------
dbo.abc      PK_abc       CLUSTERED        1               col1
dbo.abc      IX_2         NONCLUSTERED     1               col3

So it's easy to see that the non-clustered index doesn't contain "col1", which I would expect to be included since it's a clustered index key.

Is there a way to see that this column is actually a part of non-clustered index?

1

There are 1 best solutions below

2
seanb On

One way to see it is to do a SELECT from a simple table and see which indexes it uses.

In the setup below, I have

  • C1 as the primary key, but nonclustered
  • C2 as the clustered index,
  • C3 as a column that has a non-clustered index
  • C4 is a column made to just take up some space (makes clustered indexes more unwieldy)

(Note - I use a numbers table to populate the temporary table with 10,000 rows).

CREATE TABLE #Test (C1 int PRIMARY KEY NONCLUSTERED, C2 int, C3 int, C4 nvarchar(50));
CREATE CLUSTERED INDEX #CX_Test ON #Test (C2);
CREATE NONCLUSTERED INDEX #IX_Test ON #Test (C3);

-- Populate test with 10,000 rows
INSERT INTO #Test (C1, C2, C3, C4)
SELECT  n, n, n, N'Text here to fill up space so more reads on data'
FROM    dbo.Numbers
WHERE   n <= 10000;

UPDATE STATISTICS #Test;

Now, run SET STATISTICS TIME, IO ON; and also turn on 'Include actual execution plan', then execute the following

SELECT C3, C2
FROM   #Test;

SELECT C3, C1
FROM   #Test;

The first one (SELECTING C3, C2) does an index scan on the nonclustered index only - #IX_Test - showing the clustering column is also included in the nonclustered index.

On the other hand, the second one (SELECTING C3, C1) does a clustered index scan to get the information it needs in the C1 field - even though, in this case, it's the primary key. However, because it's not part of the nonclustered index, it has to get the information from the more-unwieldy clustered index (and correspondingly has an 8x higher number of logical reads than the first one).

Finally, you can do another type of test

SELECT TOP 10 C3, C2
FROM   #Test
ORDER BY C3;

SELECT TOP 10 C3, C1
FROM   #Test
ORDER BY C3;

These are similar to above, but because it's only taking the TOP 10 with ORDER BY C3, it's useful to use the nonclustered index.

In the first one, it only needs to read the nonclustered index as C2 is also included in the nonclustered index.

In the second one, it needs to do a key lookup to get the C1 data, as it is not included in the nonclustered index.

Finally, if I drop the clustered index (so now the table has no clustered index - it's just a heap) and run the last statements again

DROP INDEX #CX_Test ON #Test;

SELECT TOP 10 C3, C2
FROM   #Test
ORDER BY C3;

SELECT TOP 10 C3, C1
FROM   #Test
ORDER BY C3;

The plans are the same as each other - reading from the non-clustered index, then doing an RID lookup (e.g., using Row_ID rather than clustering index) to get the other fields.

Dropping the clustered index effectively changed the contents of the non-clustered index, but didn't have an actual change in its definition.

Edit: And here's a somewhat counterintuitive one. With the indexes as they were initially (before dropping the clustered index), run the following.

SELECT C2
FROM #Test;

In this case, it does a read from the non-clustered index, even though the fields actually specified in the index (C3) was not actually in the SELECT statement. SQL Server simply asked itself "should I get the data from the clustered index (which has a lot of extra reads) or the smaller non-clustered index?" - and obviously decided that the non-clustered was the way to go.


UPDATE - Data approach

In comments, the OP specified they wanted a method via meta-data.

I've been unable to get a meta-data method, but I've found one where you can use DBCC to look at the actual data directly, which shows the clustered index is included.

This is based on https://www.mssqltips.com/sqlservertip/1578/using-dbcc-page-to-examine-sql-server-table-and-index-data/ - but note it appears you no longer need to set the traceflag 3604.

For this, I'll create a new table TestTab in the database Testdb, with similar setup as above (nonclustered Primary Key, separate clustered index, a field for the nonclustered index, and another misc field).

CREATE TABLE TestTab (
  TT_ID int PRIMARY KEY NONCLUSTERED, 
  TT_CX int, 
  TT_IX_field nvarchar(5), 
  TT_Other_field nvarchar(50)
  );
CREATE CLUSTERED INDEX CX_TestTab ON TestTab (TT_CX);
CREATE NONCLUSTERED INDEX IX_TestTab ON TestTab (TT_IX_field);

INSERT INTO TestTab (TT_ID, TT_CX, TT_IX_Field, TT_Other_field) VALUES
(1, 15, 'DKTPE', 'Random text goes here'),
(2, 10, 'GEWPX', 'More random text');

Now, the first command is to get the relevant page info

DBCC IND('Testdb', TestTab, -1);

The first 11 columns are as below (with ObjectIDs and PartitionIDs 'X'ed out)

PageFID  PagePID  IAMFID  IAMPID  ObjectID  IndexID  PartitionNumber  PartitionID  iam_chain_type  PageType  IndexLevel
1        133075   NULL    NULL    XXXXXXX   1        1                XXXXX        In-row data     10        NULL
1        152432   1       133075  XXXXXXX   1        1                XXXXX        In-row data     1         0   
1        133077   NULL    NULL    XXXXXXX   2        1                XXXXX        In-row data     10        NULL
1        160528   1       133077  XXXXXXX   2        1                XXXXX        In-row data     2         0   
1        133078   NULL    NULL    XXXXXXX   3        1                XXXXX        In-row data     10        NULL
1        169849   1       133078  XXXXXXX   3        1                XXXXX        In-row data     2         0   

From there, the last row is the relevant one (it's the actual data for index 3). First column (FileID) is 1, 2nd column (PageID) is 169849 - use these as inputs for the next DBCC command (the last value, 3, represents the level of detail, I believe).

DBCC PAGE('Testdb',1,169849,3) WITH TABLERESULTS;

This reports two tables - the second table is the contents of the index. It shows the indexed field (TT_IX_Field) as well as the clustered index value (TT_CX).

FileId  PageId  Row  Level  TT_IX_field (key)  TT_CX (key)  UNIQUIFIER (key)  KeyHashValue    Row Size
1       169849  0    0      DKTPE              15           0                 (eba1206db4a9)  22
1       169849  1    0      GEWPX              10           0                 (0d8521571a6a)  22