Data compression algorithm in SAP clusters

1.2k Views Asked by At

I'm interested in data decompression in SAP systems.

Which algorithm is used for compression/decompression in clustered tables? For example, in RFBLG table. I read something about the LZ algorithm but I'm not sure how it exactly works.

Is there a detailed document describing how it works in SAP?

2

There are 2 best solutions below

0
On BEST ANSWER

The compression mechanisms would be highly dependent on the DB backend behind the Netweaver.

You can view compression method in Database Utility (SE14) by Go To->Storage Parameters.

enter image description here

Generally SAP uses three compression types for clustered (as well) data:

  • NONE. No compression
  • ROW. It stores row in a variable length and search repetitive patterns to compress.
  • PAGE. It is performed on top of row compression.

However, their implementation by DB vendors may differ significantly.

SAP and MS created MSSCOMPRESS report for performing table compression. Look at these articles too, where MS mentions about UCS-2 compression for SAP systems:

Oracle involves its own compression mechanisms which are described, for example, in note 1436352, called Oracle Database 11g Advanced Compression for SAP Systems (S-LOGIN required).

DB2 uses LZ2 (Lempel-Z) algorithm as your correctly stated. Here is detailed manual.

0
On

The table RFBLG is a special database table whose compressed part is managed by the ABAP kernel, not by the database. RFBLG is known as a SAP Table Cluster.

A SAP Table Cluster contains Cluster Table(s) which can be accessed transparently in ABAP Open SQL as if it was a true table. Only ABAP SQL (known as SAP "Open SQL") can read the Cluster Tables.

SAP never published the compression algorithm. SAP just explains the general logic of Table Clusters.

RFBLG is a well-known Table Cluster in SAP ERP software, which contains the data of Cluster Tables BSEC, BSED, BSEG, BSES, BSET. Note that there are many other Table Clusters (CDPOS, EDI40, etc.)

If you want to decompress a Table Cluster/Cluster Table, you must create a (very simple) ABAP program which reads the Cluster Table(s), and which you wrap in a webservice or whatever, that you may call from an external software. Note that the Function Module RFC_READ_TABLE (or any other "ersatz") is a generic solution to read any table, even Cluster Tables, but has many flaws (not related to Cluster Tables).

Note that with ABAP 7.53, SAP has abandoned Table Clusters (see below links). Cluster Tables (BSEG...) were all transformed into normal database tables, the Table Clusters (RFBLG...) don't exist anymore.

NB: in ABAP, it's non-sense to read the Table Clusters because the content will be read as compressed. ABAP programs only read the Cluster Tables (except if there's a special need of course).

More information

Just to clarify what this "SAP cluster thing" is, here is an example based on BSEG and RFBLG.

Technical structure (ABAP Data Dictionary/SE11) and contents for both (same lines are shown):

  • RFBLG:

  • BSEG:

Relationship between ABAP and database:

  • ABAP SQL:

    SELECT * FROM bseg INTO TABLE @DATA(itab).
    
  • Technically speaking, the above SELECT executes 3 operations:

    1. The ABAP kernel transforms the ABAP SQL into the SQL of the database (RDBMS) connected to SAP ERP software:
      SELECT * FROM rfblg INTO TABLE @DATA(itab).
      
    2. The database runs the transformed SQL statement.
    3. The ABAP kernel decompresses the rows of column VARDATA of RFBLG into rows of structure BSEG ("columns" BUZEI, BUZID, etc.)

References:

  • ABAP Doc 7.52: Cluster Tables in Table Clusters

    [ed: the figure below is not part of the ABAP Doc, it's taken from the SAP Help Portal > Pooled and Cluster Tables]

    "A table cluster is defined as a database table in the database. It contains all rows of the cluster tables assigned to it. Here, multiple rows from various cluster tables are grouped in a byte string. There is one row and (potentially) multiple continuation rows for this byte string. A table cluster has the following table fields for this:"

    "Custom key fields CLKEY1, CLKEY2, ... with any name and any type permitted for key fields except decimal floating point numbers."

    "Key field PAGENO with the type INT2 for indicating continuation rows."

    "TIMESTMP with the type CHAR of the length 14."

    "PAGELG with the type INT2 for the length of the byte string in VARDATA."

    "VARDATA: with the type RAW for the rows (compressed in a byte string) of the assigned cluster tables determined using the key fields CLKEY1, CLKEY2, ... Alongside the actual values, the byte string contains information about the structure of the data and from which cluster table it comes. If the byte string is longer than the maximum length of the VARDATA field, a continuation row is written using the same key values. The continuation rows of a key are distinguished by their values in the field PAGENO. The actual length of the byte string is saved in the field PAGELG."

  • Table cluster:

    "Before Release 7.53, database table in the database that contains the data of multiple cluster tables. Cluster tables are no longer supported from Release 7.53 and all table clusters were removed."

  • Cluster table:

    "Before Release 7.53, table category of a DDIC database table in ABAP Dictionary. A cluster table was not defined as such on the database. Instead, multiple cluster tables were stored in a table cluster in the database. Only ABAP SQL could be used to access cluster tables, whereby certain restrictions applied. Cluster tables are no longer supported as of Release 7.53. Any existing cluster tables were converted to transparent tables."