Optimal way to transfer 4 Virtual SDA Tables in Hana DB

1.3k Views Asked by At

Problem: My team is currently undergoing an ERP migration from an ECC system to a new S/4 Hana System. As part of go-live, our team needs to replicate all of the tables out of the S/4 system and into our SLT schema which will host the data. A majority of the tables will be handled by SLT replication out of SAP. But, due to tight timelines, we have identified 4 tables that would require multiple days of replication. The idea is to copy out the existing data from the remote source (ABAP/SDA) and place in our SLT schema. Once that is accomplished, we can activate the point forward replication and allow all new or modified records to be updated view SLT replication.

Approaches Attempted: Our current approach is to establish an SDA connection with the backend S/4 database and break the data down by year to insert into our local table using a stored procedure. There have been a number of issues that have arose with this approach, but it is currently working. It's just super slow.

Questions for the forum:

  • Is this how you would approach this type of problem? If not, what is your proposed solution?
  • Do you see anything in the target table that needs to be customized in order to increase performance?
  • Does anything stand out to you in the stored procedure that may need to be tuned?

Example: Let's pretend we have a source table named: A_tbl

  • 500 million records in the A_tbl
  • Roughly 500 columns wide

Then we will have our target table: B_tbl

  • Same # of columns as A_tbl (500)
  • Round Robin Partition of 12
  • Indexed on 5 columns

Current Procedure:

CREATE OR REPLACE procedure LOAD_B_TBL_FROM_A_TBL ()
as
begin

    declare v_offset_nbr integer;
    declare v_record_count integer;
    declare v_commit_count integer;
    declare i integer;
    declare v_year nvarchar(4);
    declare v_record_per_commit_count CONSTANT INT = 1000000; 
    declare v_table_name CONSTANT NVARCHAR(30) = 'A_TBL';   
    declare v_start_year CONSTANT INT = 2011;
    declare v_end_year CONSTANT INT = 2022;
    declare year_nbr integer;

    
    for year_nbr in v_start_year..v_end_year do
    
        select IfNull(max(offset_nbr),0) into v_offset_nbr from B_TBL_SCHEMA.bulk_load_log where table_name = :v_table_name AND year_nbr = to_varchar(year_nbr); -- Get offset number of records
        
        select count(*) into v_record_count from A_TBL_SCHEMAA_TBL A_TBL WHERE A_TBL.YEAR = to_varchar(year_nbr); -- Count the source records.
    
        v_record_count = v_record_count - v_offset_nbr; -- Subtract out the records already committed for the current year. Failsafe if procedure fails
        
        v_commit_count = v_record_count / v_record_per_commit_count; -- Number of times we need to loop
        
        IF v_record_count < v_record_per_commit_count THEN -- Don't enter the loop if it's not necessary
            INSERT INTO B_TBL_SCHEMA.B_TBL (
                SELECT * FROM A_TBL_SCHEMAA_TBL
                WHERE A_TBL.YEAR = to_varchar(year_nbr)
            ); -- Insert into our target table 
            COMMIT;
            
            
            insert into B_TBL_SCHEMA.bulk_load_log values(
                v_table_name,
                to_varchar(year_nbr),
                :v_offset_nbr,
                now()
            ); -- Insert into a logging table to keep up with offset
    
        ELSE
    
            for i in 0..v_commit_count do -- Loop number of commit times. (500 million / 1 million) = 500 commits necessary to process entire table
        
                INSERT INTO B_TBL_SCHEMA.B_TBL (
                    SELECT * FROM A_TBL_SCHEMAA_TBL 
                    WHERE A_TBL.YEAR = to_varchar(year_nbr)
                    LIMIT :v_record_per_commit_count OFFSET :v_offset_nbr
                ); -- Insert into our target table
                COMMIT;
            
                v_offset_nbr = v_offset_nbr + v_record_per_commit_count; -- Update the offset before logging so we know where to begin if procedure fails
                
                insert into B_TBL_SCHEMA.bulk_load_log values(
                    v_table_name,
                    to_varchar(year_nbr),
                    :v_offset_nbr,
                    now()
                ); -- Insert into logging table to keep up with offset
                COMMIT;
            end for;
        end if;
    end for;
end;
2

There are 2 best solutions below

1
On

I think the easiest and fastest way to transfer the tables without any additional administration is EXPORT statement with BINARY format option. This can also be done via HANA studio context menu on schema or via File -> Export... -> SAP HANA -> Catalog Objects, File -> Import....

With this approach you can manually set number of threads for export and import with no additional tricky code. After import in target system you'll have the same table with the same structure in the same schema as in the source system, so to move the table to new name or schema you need to copy it first in the source. After import you can insert ... select ... into the target table or create a copy table with desired partitioning in the source system or repartition the imported table in the target and use it as the target table.

What is the advantages:

  • You do not need any code or SDA connection between the systems.
  • You do not need to check and recheck if your code worked fine and all the data was transfered without duplicates.
  • You will have exactly the same table in the target.
  • Table is exported and imported in internal columnar format in parallel, so no tuple reconstruction appears inbetween (I do not know if HANA SDA driver smart enough to transfer the columns via SDA channel and do tuple reconstruction in the target). And column and record count have little influence, just columns with high cardinality data takes time, empty or low-cardinality columns are exported in a blink of an eye.

And finally, I've tested this in my system via HANA studio (import to local machine): table with 130M records and 57 columns with 5Gb size was exported in 8 threads within 6 minutes.

What about your original approach: you should always disable of drop indexes and constraints for really bulk operations and rebuild/enable them at the very end to save time for index rebuilding or constraints check during the insert.

0
On

The performance bottleneck is probably the transfer over SDA that uses only one thread.

A workaround could be:

  • create a function with an input parameter to select a subset of table A. For instance the year.
  • put a list of years in a table variable
  • call map_merge to force parallelism and channel the output into the target table.

Here's some pseudo code.

-- virtual table to source A over SDA: VT_TBL_A
-- target table B: TBL_B

CREATE OR REPLACE FUNCTION F_TBL_A_YEAR ( IN YEAR INT)
RETURNS TABLE ( ... )
as begin
   select * from VT_TBL_A where year(MY_DT_COLUMN) = :YEAR;
end;

DO
BEGIN
   DECLARE t_var LIKE TBL_B;

   --list of all 'partitions'
   years=select distinct year(MY_DT_COLUMN) as MY_YEAR from VT_TBL_A;

   --call the function for each year
   t_var = MAP_MERGE(:years, F_TBL_A_YEAR( :years.MY_YEAR));

   insert into TBL_B select * from :t_var;
   commit;
END;
 

While this runs, I suggest that you check the system view M_REMOTE_STATEMENTS to confirm that records are transfered over multiple connections