Oracle partition / archive strategy for type 2 dimension table

263 Views Asked by At

I have a really wide table in Oracle that's at type 2 dimension.

Records have from_dates and to_dates with the latest 'current' records having a high end date of 31st Dec 9999. There are currently two partitions on the table, one for the 'current' records and one for 'history' records.

There's a new requirement to only keep the last 12 months of records in the 'history' partition. I interpret this as keeping records that were valid in the last 12 months i.e. where the record's to_date < (this month- 11 months).

Normally if I wanted to get rid of records I'd just drop a partition, but in this case that wouldn't work as I need to retain some of the records in the existing 'history' partition.

Is there any partitioning strategy that could support this or am I barking up the wrong tree?

2

There are 2 best solutions below

1
On BEST ANSWER

You aren't accomplishing much with merely two partitions, "current" and "history". You need to repartition this by month. Then you can implement a rolling partition drop of partitions older than 12 months, which will require a bit of scripting.

Normally we use interval partitioning INTERVAL(NUMTOYMINTERVAL(1,'MONTH')) so we don't have to maintain partition adds manually or through scripting. However, unfortunately in your case you won't be able to because of your use of the special date 12/31/9999. This is the maximum date allowable in Oracle. Interval partitioning will internally add the interval to date values when determining whether a new partition is needed or not, and that will overflow the maximum date value allowed and raise an error. The use of this special date essentially disables the use of interval partitioning.

You have no choice but to either change your special "eternity" date to something less than one interval away from 12/31/9999 (anything less than 12/01/9999 would permit monthly interval partitioning, or anything less than 12/31/9998 would permit yearly interval partitioning). Or, as usually happens because code would have to be changed to accommodate these solutions, you have to manually build out partitions ahead of time or create a scheduled script that does it for you.

1
On

You need not to switch from your current two partition schema to a monthly partitioning to achieve an effective rolling window to eliminate the history records after N months. The important thing is than you need to use a ROW MOVEMENT in your partitioning schema, as the VALID_TO column used as partition key needs to be updated. This very same ROW MOVEMENT may be used to effectively remove the obsolete records with the help of subpartitions.

I assume a DDL such as

create table scd2 (
ID int,
VERSION INT,
VALIDFROM_D  DATE  NOT NULL,
VALIDTO_D    DATE  NOT NULL 
)
PARTITION BY range(VALIDTO_D)
(
  PARTITION SCD2_HIST VALUES LESS THAN (to_date('2500.01.01','yyyy.mm.dd') ),
  PARTITION SCD2_CURR
    VALUES LESS THAN (MAXVALUE)
)
enable row movement;

So you have two partitions SCD2_CURR and SCD2_HIST

The queries as select * from scd2 where VALIDTO_D = DATE'9999-12-31' goes only to the SCD2_CURR partition – see the execution plan below (consider Pstart – Pstop)

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    44 |   274   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|      |     1 |    44 |   274   (0)| 00:00:01 |     2 |     2 |
|*  2 |   TABLE ACCESS FULL    | SCD2 |     1 |    44 |   274   (0)| 00:00:01 |     2 |     2 |
-----------------------------------------------------------------------------------------------

To enable effective rolling window, add a new column IS_ACTIVE_HISTORY with DEFAULT 'Y' and a subpartition on this column.

SCD2 Table with Composite Partitions

create table scd2 (
ID int,
VERSION INT,
VALIDFROM_D  DATE  NOT NULL,
VALIDTO_D    DATE  NOT NULL,
IS_ACTIVE_HISTORY VARCHAR2(1) DEFAULT 'Y'
)
PARTITION BY range(VALIDTO_D)
 SUBPARTITION BY LIST (IS_ACTIVE_HISTORY)
SUBPARTITION TEMPLATE 
(
SUBPARTITION active_hist  VALUES ('Y'),      
SUBPARTITION obsolete_hist  VALUES (null)  
)
(
  PARTITION SCD2_HIST VALUES LESS THAN (to_date('2500.01.01','yyyy.mm.dd') ),
  PARTITION SCD2_CURR
    VALUES LESS THAN (MAXVALUE)
)
enable row movement;

You end with four subpartitions

SCD2_CURR_ACTIVE_HIST your current data

SCD2_CURR_OBSOLETE_HIST empty

SCD2_HIST_ACTIVE_HIST history data within the rolling window

SCD2_HIST_OBSOLETE_HIST history data to be deleted

New records are always stored in SCD2_CURR_ACTIVE_HIST

After an update the replaced version is moved in the partition SCD2_CURR_ACTIVE_HIST.

To implement the rolling window, you set up a periodic process (e.g. once a month or day) that reset the column IS_ACTIVE_HISTORY (with a simple update below)

-- move to obsolet history
update scd2
set IS_ACTIVE_HISTORY = null
where VALIDTO_D < add_months(trunc(sysdate),-12);

This moves the rows to the partition SCD2_CURR_OBSOLETE_HIST that is truncated in the following step

 alter table scd2 truncate subpartition  SCD2_HIST_OBSOLETE_HIST;

Note that your current setup with two partition (current / history) may not be the best choice.

A possible alternative is the range partitioning based on VALID_FROM date. VALID_FROM date is stable so there is no need for the row movement and this schema may also prune partition for some queries on the history data. Of course it can be combined with the subpartitions described above.