Best method for updating SQL Server partition function

170 Views Asked by At

I have several large SQL Server tables where the current partition functions are not extended beyond this month, and the procedure that is supposed to split the partition function into new date ranges before the final partition has data in it did not run as expected.

All of these tables have a clustered columnstore index. While that index is in place you cannot split a partition function range if the partition you are splitting has data.

My strategy was to drop the clustered columnstore index from each table that uses the partition function, then update the partition scheme and split the partition function range as needed, then rebuild the clustered columnstore index.

It would look something like this:

USE [Database]
GO

DROP INDEX [reporting_ClaimStatus2_cci] ON [reporting].[ClaimStatus]
GO

------------------------------------

ALTER PARTITION SCHEME [PS_RPT1_Date_Month_Right]
NEXT USED [FG_ClaimDetail_2023M12]
GO

ALTER PARTITION FUNCTION [PF_RPT1_Date_Month_Right]()
SPLIT RANGE ( N'2023-12-01T00:00:00.000') 
GO

------------------------------------------------------------

CREATE CLUSTERED COLUMNSTORE INDEX [reporting_ClaimStatus2_cci] 
ON [reporting].[ClaimStatus] 
WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0)
GO

My biggest issue is that I don't know how to estimate the amount of time or memory it will take to drop and recreate these indexes as some tables have more than 2B records.

Can you recommend how to estimate how long it will take to drop and recreate an index on a table, or can you recommend a different strategy for fixing the partitions?

2

There are 2 best solutions below

4
Ben Thul On

Can you recommend a different strategy for fixing the partitions?

Yes I can! Your observation of not being able to split a partition that has data in it is the key. The generally accepted best practice for avoiding that is to always have an empty partition on the side of the range that you're splitting. So, in your example (which I assume is in anticipation of the upcoming rollover into the month of December 2023), you'd also add a partition boundary for January 2024. Then, sometime in the month of December, you'll add a boundary for February 2024. By always staying one (or two, depending on how you're counting) month ahead of current, you're splitting an empty partition which is totally fine.

If you're paranoid, add a couple of partitions in there and stay several months ahead. Any way you do it though, I'd recommend adding monitoring for something like "Oh hey... there are fewer than «desired number» empty partitions at the end of the scheme. Might want to do something about that.".

6
Dan Guzman On

Missed partition maintenance can be a pain with large tables. This is especially true with a clustered colunmstore index due to the additional restriction that a non-empty partition cannot be split even when no data movement is necessary to accomodate the new partition boundary.

A much less heavy-handed remediation technique than rebuilding the entire table is with an alligned staging table. This allows one to use SWITCH to temporarly empty the partition for the SPLIT operation and move data back into the table after the SPLIT with SWITCH. This is very fast since SWITCH is a meta-data only operation; no data movement is required.

In summary:

  1. create a new partition function like the original plus the new month boundary
  2. create a new partiton scheme for the above function with the same fielgroup mappings as the original scheme plus the new partition
  3. create a new staging table exactly like the original table, including indexes
  4. create a temporary check constraint on the original table to ensure no data exists for the new partition boundary
  5. SWITCH the partition to be split (last partition) into the staging table, which will empty the partition and save data
  6. drop the temporary check constraint on the original table
  7. SPLIT the partition function to create the new monthly boundary
  8. SWITCH the partition back into the original table

These steps should take no more than a few seconds with the exception of creating the check constraint, which will need to validate no data exists after November 2023. Below is an example script.

--objects and data for simplified example
CREATE PARTITION FUNCTION PF_RPT1_Date_Month_Right (datetime) 
    AS RANGE RIGHT FOR VALUES(
          '2023-01-01T00:00:00.000'
        , '2023-02-01T00:00:00.000'
        , '2023-03-01T00:00:00.000'
        , '2023-04-01T00:00:00.000'
        , '2023-05-01T00:00:00.000'
        , '2023-06-01T00:00:00.000'
        , '2023-07-01T00:00:00.000'
        , '2023-08-01T00:00:00.000'
        , '2023-09-01T00:00:00.000'
        , '2023-10-01T00:00:00.000'
        , '2023-11-01T00:00:00.000'
    );
CREATE PARTITION SCHEME PS_RPT1_Date_Month_Right 
    AS PARTITION PF_RPT1_Date_Month_Right ALL TO ([PRIMARY]);
CREATE TABLE reporting.ClaimStatus(
    DateColumn datetime
) ON PS_RPT1_Date_Month_Right(DateColumn);
CREATE CLUSTERED COLUMNSTORE INDEX reporting_ClaimStatus2_cci ON reporting.ClaimStatus 
    ON PS_RPT1_Date_Month_Right(DateColumn);
INSERT INTO reporting.ClaimStatus VALUES
      ('2023-10-01T00:00:00.000')
    , ('2023-10-31T00:00:00.000')
    , ('2023-11-01T00:00:00.000')
    , ('2023-11-30T00:00:00.000');
GO

--create temporary partition function with new boundary
CREATE PARTITION FUNCTION PF_RPT1_Date_Month_Right_temp (datetime) AS RANGE RIGHT FOR VALUES(
      '2023-01-01T00:00:00.000'
    , '2023-02-01T00:00:00.000'
    , '2023-03-01T00:00:00.000'
    , '2023-04-01T00:00:00.000'
    , '2023-05-01T00:00:00.000'
    , '2023-06-01T00:00:00.000'
    , '2023-07-01T00:00:00.000'
    , '2023-08-01T00:00:00.000'
    , '2023-09-01T00:00:00.000'
    , '2023-10-01T00:00:00.000'
    , '2023-11-01T00:00:00.000'
    , '2023-12-01T00:00:00.000'
    );

--create temporary partition scheme with new partition
CREATE PARTITION SCHEME PS_RPT1_Date_Month_Right_temp 
    AS PARTITION PF_RPT1_Date_Month_Right_temp ALL TO ([PRIMARY]);

--create staging table exactly like the source table
CREATE TABLE reporting.ClaimStatus_temp(
    DateColumn datetime
) ON PS_RPT1_Date_Month_Right_temp(DateColumn);
CREATE CLUSTERED COLUMNSTORE INDEX reporting_ClaimStatus2_cci ON reporting.ClaimStatus_temp 
    ON PS_RPT1_Date_Month_Right_temp(DateColumn);

--add temporary constraint for November 2023 upper boundary to allow SWITCH
ALTER TABLE reporting.ClaimStatus
    ADD CONSTRAINT CK_ClaimStatus_DateColumn CHECK (DateColumn < '2023-12-01T00:00:00.000');

--switch November 2023 partition to staging table
ALTER TABLE reporting.ClaimStatus
    SWITCH PARTITION $PARTITION.PF_RPT1_Date_Month_Right('2023-11-01T00:00:00.000')
    TO reporting.ClaimStatus_temp PARTITION $PARTITION.PF_RPT1_Date_Month_Right_temp('2023-11-01T00:00:00.000');

--drop tempoorary constraint
ALTER TABLE reporting.ClaimStatus
    DROP CONSTRAINT CK_ClaimStatus_DateColumn;

--create new partiton boundary
ALTER PARTITION SCHEME PS_RPT1_Date_Month_Right NEXT USED [PRIMARY];
ALTER PARTITION FUNCTION PF_RPT1_Date_Month_Right() 
    SPLIT RANGE('2023-12-01T00:00:00.000');

--switch November 2023 partiton back into primary table
ALTER TABLE reporting.ClaimStatus_temp
    SWITCH PARTITION $PARTITION.PF_RPT1_Date_Month_Right_temp('2023-11-01T00:00:00.000')
    TO reporting.ClaimStatus PARTITION $PARTITION.PF_RPT1_Date_Month_Right('2023-11-01T00:00:00.000');