Estimate the table size in SQL Server

6.2k Views Asked by At

I need to estimate the database size for prerequisite, so I'm trying to understand how does SQL Server store data in the example below.

In my SQL Server database, I have a table named InfoComp which contains 4 rows:

IdInfoComp : Integer Not Null (PK)  
IdDefinition : Integer Not Null (FK)  
IdObject : Integer Not Null (FK)  
Value : NVarChar(Max) Not Null 

I want estimate the table size. In real usage, I can get the average length stored in Value with this SQL query:

SELECT AVG(Value) FROM InfoComp  
Result : 8

So, my calculation seems to be (in byte):

(Size(IdInfoComp) + Size(IdDefinition) + Size(IdObject) + AVG Size(Value)) * Rows count

( 4 + 4 + 4 + ((8 * 2) + 2)) * NbRows

But when I'm trying to apply this calculation in the real case, it's wrong. In my case, I have 3,250,273 rows so the result should be 92 MB, but MS SQL Report says:

(Data) 147 888 KB (Indexes) 113 072 KB and (Reserved) 261 160 KB.

Where am I wrong?

2

There are 2 best solutions below

0
Matt Fischer On

Try this...this gets me close. I used the msdn article to create . You can set the number of rows. This will do every table in the db including the indexes. Doesn't do columnstores yet and won't handle relationships. It will just apply the rowcount estimate to every table.

/*Do NOT change this section*/
GO
CREATE TABLE RowSizes (TypeName VARCHAR(30), TableName VARCHAR(255), IndexName VARCHAR(255), Null_Bitmap SMALLINT, VariableFieldSize BIGINT, FixedFieldSize BIGINT, Row_Size BIGINT, LOBFieldSize BIGINT);
CREATE TABLE LeafSizes (TypeName VARCHAR(30), TableName VARCHAR(255), IndexName VARCHAR(255), Row_Size BIGINT, Rows_Per_Page BIGINT, Free_Rows_Per_Page BIGINT, Non_Leaf_Levels BIGINT, Num_Leaf_Pages BIGINT, Num_Index_Pages BIGINT, Leaf_space_used_bytes BIGINT);
GO
CREATE PROCEDURE dbo.cp_CalcIndexPages
    @IndexType VARCHAR(20)
AS
BEGIN
    DECLARE @IndexName VARCHAR(255)
        , @TableName varchar(255)
        , @Non_Leaf_Levels bigint = 127
        , @Rows_Per_Page bigint = 476 
        , @Num_Leaf_Pages bigint =10000;

    WHILE EXISTS(SELECT TOP 1 1 FROM dbo.LeafSizes WHERE TypeName = @IndexType AND Num_Index_Pages = 0)-- AND IndexName = 'PK_ProcessingMessages')
    BEGIN
        SELECT TOP 1 @IndexName = IndexName
            , @TableName = TableName
            , @Non_Leaf_Levels = Non_Leaf_Levels
            , @Rows_Per_Page = Rows_Per_Page
            , @Num_Leaf_Pages = Num_Leaf_Pages
        FROM dbo.LeafSizes
        WHERE TypeName = @IndexType
            AND Num_Index_Pages = 0;

        DECLARE @Counter INT = 1
            , @Num_Index_Pages INT = 0;

        WHILE @Counter <= @Non_Leaf_Levels
        BEGIN
            BEGIN TRY

            SELECT @Num_Index_Pages += ROUND(CASE WHEN @Num_Leaf_Pages/POWER(@Rows_Per_Page, @Counter) < CONVERT(FLOAT, 1) THEN 1 ELSE @Num_Leaf_Pages/POWER(@Rows_Per_Page, @Counter) END, 0)
            END TRY

            BEGIN CATCH
                SET @Num_Index_Pages += 1
            END CATCH

            SET @Counter += 1
        END

        IF @Num_Index_Pages = 0 
            SET @Num_Index_Pages =  1;

        UPDATE dbo.LeafSizes
        SET Num_Index_Pages = @Num_Index_Pages
            , Leaf_space_used_bytes = 8192 * @Num_Index_Pages
        WHERE TableName = @TableName
            AND IndexName = @IndexName;

    END
END
GO
/*Do NOT change above here*/

--Set parameters here
DECLARE @NumRows INT = 1000000 --Number of rows for estimate
    ,@VarPercentFill money = .6; --Percentage of variable field space used to estimate.  1 will provide estimate as if all variable columns are 100% full.


/*Do not change*/
WITH cte_Tables AS (--Get Tables
    SELECT o.object_id, s.name+'.'+o.name AS ObjectName
    FROM sys.objects o
    INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
    WHERE type = 'U'
), cte_TableData AS (--Calculate Field Sizes
    SELECT o.ObjectName AS TableName
        , SUM(CASE WHEN t.name IN ('int', 'bigint', 'tinyint', 'char', 'datetime', 'smallint', 'date') THEN 1 ELSE 0 END) AS FixedFields
        , SUM(CASE WHEN t.name IN ('int', 'bigint', 'tinyint', 'char', 'datetime', 'smallint', 'date') THEN c.max_length ELSE 0 END) AS FixedFieldSize
        , SUM(CASE WHEN t.name IN ('varchar') THEN 1 ELSE 0 END) AS VariableFields
        , SUM(CASE WHEN t.name IN ('varchar') THEN c.max_length ELSE 0 END)*@VarPercentFill AS VariableFieldSize
        , SUM(CASE WHEN t.name IN ('xml') THEN 1 ELSE 0 END) AS LOBFields
        , SUM(CASE WHEN t.name IN ('xml') THEN 10000 ELSE 0 END) AS LOBFieldSize
        , COUNT(1) AS TotalColumns
    FROM sys.columns c
    INNER JOIN cte_Tables o ON o.object_id = c.object_id
    INNER JOIN sys.types t ON c.system_type_id = t.system_type_id
    GROUP BY o.ObjectName
), cte_Indexes AS (--Get Indexes and size
    SELECT s.name+'.'+o.name AS TableName
        , ISNULL(i.name, '') AS IndexName
        , i.type_desc
        , i.index_id
        , SUM(CASE WHEN t.name IN ('tinyint','smallint', 'int', 'bigint', 'char', 'datetime', 'date') AND c.key_ordinal > 0 THEN 1 ELSE 0 END) AS FixedFields
        , SUM(CASE WHEN t.name IN ('tinyint','smallint', 'int', 'bigint', 'char', 'datetime', 'date') AND c.key_ordinal > 0 THEN tc.max_length ELSE 0 END) AS FixedFieldSize
        , SUM(CASE WHEN t.name IN ('varchar') AND c.key_ordinal > 0 THEN 1 ELSE 0 END) AS VariableFields
        , SUM(CASE WHEN t.name IN ('varchar') AND c.key_ordinal > 0 THEN tc.max_length ELSE 0 END)*@VarPercentFill AS VariableFieldSize
        , SUM(CASE WHEN t.name IN ('xml') AND c.key_ordinal > 0 THEN 1 ELSE 0 END) AS LOBFields
        , SUM(CASE WHEN t.name IN ('xml') AND c.key_ordinal > 0 THEN 10000 ELSE 0 END) AS LOBFieldSize
        , SUM(CASE WHEN t.name IN ('tinyint','smallint', 'int', 'bigint', 'char', 'datetime', 'date') AND c.is_included_column > 0 THEN 1 ELSE 0 END) AS FixedIncludes
        , SUM(CASE WHEN t.name IN ('tinyint','smallint', 'int', 'bigint', 'char', 'datetime', 'date') AND c.is_included_column > 0 THEN 1 ELSE 0 END) AS FixedIncludesSize
        , SUM(CASE WHEN t.name IN ('varchar') AND c.is_included_column > 0 THEN 1 ELSE 0 END)*@VarPercentFill AS VariableIncludes
        , SUM(CASE WHEN t.name IN ('varchar') AND c.is_included_column > 0 THEN tc.max_length ELSE 0 END) AS VariableIncludesSize
        , COUNT(1) AS TotalColumns
    FROM sys.indexes i
    INNER JOIN sys.columns tc ON i.object_id = tc.object_id
    INNER JOIN sys.index_columns c ON i.index_id = c.index_id 
        AND c.column_id = tc.column_id
        AND c.object_id = i.object_id
    INNER JOIN sys.objects o ON o.object_id = i.object_id AND o.is_ms_shipped = 0
    INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
    INNER JOIN sys.types t ON tc.system_type_id = t.system_type_id
    GROUP BY s.name+'.'+o.name, ISNULL(i.name, ''), i.type_desc, i.index_id
)
INSERT RowSizes
SELECT 'Table' AS TypeName
    , n.TableName
    , '' AS IndexName
    , 2 + ((n.FixedFields+n.VariableFields+7)/8) AS Null_Bitmap
    , 2 + (n.VariableFields * 2) + n.VariableFieldSize AS Variable_Data_Size
    , n.FixedFieldSize
    /*FixedFieldSize + Variable_Data_Size + Null_Bitmap*/
    , n.FixedFieldSize + (2 + (n.VariableFields * 2) + (n.VariableFieldSize)) + (2 + ((n.FixedFields+n.VariableFields+7)/8)) + 4 AS Row_Size
    , n.LOBFieldSize
FROM cte_TableData n
UNION
SELECT i.type_desc
    , i.TableName
    , i.IndexName
    , 0 AS Null_Bitmap
    , CASE WHEN i.VariableFields > 0 THEN 2 + (i.VariableFields * 2) + i.VariableFieldSize + 4 ELSE 0 END AS Variable_Data_Size
    , i.FixedFieldSize
    /*FixedFieldSize + Variable_Data_Size + Null_Bitmap if not clustered*/
    , i.FixedFieldSize + CASE WHEN i.VariableFields > 0 THEN 2 + (i.VariableFields * 2) + i.VariableFieldSize + 4 ELSE 0 END + 7 AS Row_Size
    , i.LOBFieldSize
FROM cte_Indexes i
WHERE i.index_id IN(0,1)
UNION
SELECT i.type_desc
    , i.TableName
    , i.IndexName
    , CASE WHEN si.TotalColumns IS NULL THEN 2 + ((i.FixedFields+i.VariableFields+i.VariableIncludes+i.FixedIncludes+8)/8) 
            ELSE 2 + ((i.FixedFields+i.VariableFields+i.VariableIncludes+i.FixedIncludes+7)/8)
        END AS Null_Bitmap
    , CASE WHEN si.TotalColumns IS NULL THEN 2 + ((i.VariableFields + 1) * 2) + (i.VariableFieldSize + 8)
            ELSE 2 + (i.VariableFields * 2) + i.VariableFieldSize 
        END AS Variable_Data_Size
    , CASE WHEN si.TotalColumns IS NULL THEN si.FixedFieldSize
            ELSE i.FixedFieldSize + si.FixedFieldSize
        END AS FixedFieldSize
    /*FixedFieldSize + Variable_Data_Size + Null_Bitmap if not clustered*/
    , CASE WHEN si.TotalColumns IS NULL THEN i.FixedFieldSize + (2 + ((i.VariableFields + 1) * 2) + (i.VariableFieldSize + 8)) + (2 + ((i.TotalColumns+8)/8)) + 7
            ELSE i.FixedFieldSize + (2 + (i.VariableFields * 2) + i.VariableFieldSize) + (2 + ((i.TotalColumns+7)/8)) + 4 
        END AS Row_Size
    , i.LOBFieldSize
FROM cte_Indexes i
LEFT OUTER JOIN cte_Indexes si ON i.TableName = si.TableName AND si.type_desc = 'CLUSTERED'
WHERE i.index_id NOT IN(0,1) AND i.type_desc = 'NONCLUSTERED';

--SELECT * FROM RowSizes

/*Calculate leaf sizes for tables and HEAPs*/
INSERT LeafSizes
SELECT r.TypeName
    , r.TableName
    ,'' AS IndexName
    , r.Row_Size
    , 8096 / (r.Row_Size + 2) AS Rows_Per_Page
    , 8096 * ((100 - 90)/100) / (r.Row_Size + 2) AS Free_Rows_Per_Page
    , 0 AS Non_Leaf_Levels
    /*Num_Leaf_Pages = Number of Rows / (Rows_Per_Page - Free_Rows_Per_Page) OR 1 if less than 1*/
    , CASE WHEN @NumRows / ((8096 / (r.Row_Size + 2)) - (8096 * ((100 - 90)/100) / (r.Row_Size + 2))) < 1 
            THEN 1 
            ELSE @NumRows / ((8096 / (r.Row_Size + 2)) - (8096 * ((100 - 90)/100) / (r.Row_Size + 2))) 
        END AS Num_Leaf_Pages
    , 0 AS Num_Index_Pages
    /*Leaf_space_used = 8192 * Num_Leaf_Pages*/
    , 8192 * CASE WHEN @NumRows / ((8096 / (r.Row_Size + 2)) - (8096 * ((100 - 90)/100) / (r.Row_Size + 2))) < 1 
                THEN 1 
                ELSE @NumRows / ((8096 / (r.Row_Size + 2)) - (8096 * ((100 - 90)/100) / (r.Row_Size + 2))) 
            END + (@NumRows * LOBFieldSize) AS Leaf_space_used_bytes
FROM RowSizes r
WHERE r.TypeName = 'Table'
ORDER BY TypeName, TableName;

/*Calculate leaf sizes for CLUSTERED indexes*/
INSERT LeafSizes
SELECT r.TypeName
    , r.TableName
    , r.IndexName
    , r.Row_Size
    , 8096 / (r.Row_Size + 2) AS Rows_Per_Page
    , 0 AS Free_Rows_Per_Page
    , 1 + ROUND(LOG(8096 / (r.Row_Size + 2)), 0)*(l.Num_Leaf_Pages/(8096 / (r.Row_Size + 2))) AS Non_Leaf_Levels
    , l.Num_Leaf_Pages
    , 0 AS Num_Index_Pages 
    , 0 AS Leaf_space_used_bytes
FROM RowSizes r
INNER JOIN LeafSizes l ON r.TableName = l.TableName AND l.TypeName = 'Table'
WHERE r.TypeName = 'CLUSTERED';

PRINT 'CLUSTERED'
EXEC dbo.cp_CalcIndexPages @IndexType = 'CLUSTERED'

/*Calculate leaf sizes for NONCLUSTERED indexes*/
INSERT LeafSizes
SELECT r.TypeName
    , r.TableName
    , r.IndexName
    , r.Row_Size
    , 8096 / (r.Row_Size + 2) AS Rows_Per_Page
    , 0 AS Free_Rows_Per_Page
    , 1 + ROUND(LOG(8096 / (r.Row_Size + 2)), 0)*(l.Num_Leaf_Pages/(8096 / (r.Row_Size + 2))) AS Non_Leaf_Levels
    , l.Num_Leaf_Pages
    , 0 AS Num_Index_Pages 
    , 0 AS Leaf_space_used_bytes
FROM RowSizes r
INNER JOIN LeafSizes l ON r.TableName = l.TableName AND l.TypeName = 'Table'
WHERE r.TypeName = 'NONCLUSTERED';

PRINT 'NONCLUSTERED'
EXEC dbo.cp_CalcIndexPages @IndexType = 'NONCLUSTERED'

SELECT * 
FROM dbo.LeafSizes
--WHERE TableName = 'eligibility.clientrequest'

SELECT TableName
    , @NumRows AS RowsPerTable
    , @VarPercentFill*100 AS VariableFieldFillFactor
    , SUM(CASE WHEN TypeName = 'Table' THEN Leaf_space_used_bytes ELSE 0 END)/1024/1024 AS TableSizeMB
    , SUM(Leaf_space_used_bytes)/1024/1024 AS SizeWithIndexesMB
FROM LeafSizes
--WHERE TableName = 'eligibility.clientrequest'
GROUP BY TableName
ORDER BY TableName;


GO
/*Cleanup when done*/
DROP PROCEDURE dbo.cp_CalcIndexPages;
DROP TABLE dbo.RowSizes;
DROP TABLE dbo.LeafSizes;
0
Yoh Deadfall On

Unfortunately, I can't say why your calculation is wrong because there is no enough information about how the table is created and how the database is configured. So I'll try to answer in common, and you will have a tip.

The first thing you should know that size of any SQL Server database is greater or equal to the size of the model database. It's because the model database is a template for new databases and therefore it's copied every time when you execute the CREATE DATABASE statement.

All information in a database is stored in 8 KB pages on disk. There are many types of pages. Some of them (like allocation maps and metadata) are used for internal purposes, but the others - for storing data.

The table size depends on how the data organized on disk (does it have clustered index or not), column types, and data compression. The size of an index depends on existence of unique index on the indexed table, level count of the index, fill factor and so on.

As I said before, everything is stored in pages and data too. SQL Server has pages for in-row data, pages for row-overflow data, and pages for LOB data. The data pages consist of three major parts: the page header, data rows, and the data offset array.

The page header occupies the first 96 bytes of each data page, leaving 8,096 bytes for other components. The row offset array is a block of 2-byte entries which stored at the end of the page. The entry count is stored in the header and called slot count.

The area between the header and the row offset array is the area in which data rows are stored. Each row consist of two components: the fixed size part, and the variable length part.

The structure of data rows is:

  • Status Bits A, 1 byte
  • Status Bits B, 1 byte
  • Fixed-length size (FSize), 2 bytes
  • Fixed-length data (FData), FSize – 4
  • Number of columns (NCol), 2 bytes
  • NULL bitmap, Ceiling (NCol / 8)
  • Number of variable-length columns stored in row (VarCount), 2 bytes
  • Variable column offset array (VarOffset), 2 * VarCount
  • Variable-length data (VarData), VarOff[VarCount] - (Fsize + 4 + Ceiling (NCol / 8) + 2 * VarCount)

NOTE Index rows are stored in the same way as data rows.

Not all things I've explained here, but I hope this will help you to understand for what purposes SQL Server uses allocated space. Also you should keep in mind that database files grow by the size specified by the FILEGROWTH option, and that can result in the larger real size than the estimated.

Also take a look at the Microsoft SQL Server 2012 Internals book and read how to Estimate the Size of a Database. It probably will be interesting for you.