How to use Import Utilities like Fastload or MLOAD in Teradata, if the target table has Referential Integrity?

2.3k Views Asked by At

I'm new to Teradata, started exploring a few weeks back. I know Fastload or Multiload utilities will work only if there is no Referential Integrity on the tables, like Foreign key relationship. I wanted to know, what if my table actually has a foreign key reference, and I want to import data to that table from any text or delimited file. Is there a tweak in using fastload/mload or any alternate method to import?

1

There are 1 best solutions below

0
On

Gowtham Vakani,

I believe you need TPUPM, but of the three load utilities in Teradata, this one is least effective. I recommend FastLoad and the way is that you dump your source data into landing tables suitable for FastLoad and then in Teradata, you make the necessary transformation. I put below some info and some differences between the three types, so you can choose based on your needs and the performance plan. I can send you some example scripts if you need any.

FastLoad utility is used to load data into empty tables. Since it does not use transient journals, data can be loaded quickly. It doesn't load duplicate rows even if the target table is a MULTISET table.

Limitation - The target table should not have a secondary index, join index and foreign key reference.

How FastLoad Works - FastLoad is executed in two phases. FastLoad divides its job into two phases, both designed for speed.

Phase 1 or Acquisition Phase

  • The primary purpose of phase 1 is to get data from the host computer into Teradata System.
  • The Parsing engines read the records from the input file and sends a block to each AMP. The data moves in 64 K blocks and is stored in worktables on the AMPs.
  • Each AMP stores the blocks of records.
  • Then AMPs hash each record and redistribute them to the correct AMP.
  • At the end of Phase 1, each AMP has its rows but they are not in row hash sequence as it’s stored at first.

Phase 2

  • Phase 2 starts when FastLoad receives the END LOADING statement.
  • Once the data is moved from the server, each AMP will hash its worktable rows.
  • Each AMP sorts the records on row hash and writes them to the disk.
  • Each ow transfers to the worktables where they permanently reside.
  • Rows of a table are stored on the disks in data blocks.
  • Locks on the target table are released and the error tables are dropped.

The MultiLoad, however, doesn't have the same limitations. MultiLoad can load multiple tables at a time and it can also perform different types of tasks such as INSERT, DELETE, UPDATE and UPSERT. It can load multiple up to 5 empty or populated target tables at a time from either a LAN or Channel environment and perform up to 20 DML operations in a script. The target table is not required for MultiLoad. MultiLoad supports two modes:

  • IMPORT
  • DELETE

MultiLoad requires a work table, a log table, and two error tables in addition to the target table.

  • Log Table − Log table stores the processing record information during load. This table contains one row for every MultiLoad running on the system.
  • Error Tables: Like FastLoad, MultiLoad also two error tables. The first Error Table (ET). It contains all translation and constraint errors that may occur while the data is being acquired from the source(s). The second Uniqueness Violation (UV) table that stores rows with duplicate values for Unique Primary Indexes (UPI).
  • Work Table(s): MultiLoad will automatically create one worktable for each target table. Usually, in IMPORT mode, MultiLoad could have one or more work tables and in DELETE mode you have only one. The Purpose of work tables are 1) to perform DM tasks 2) Applying the input data to the AMPs.

Limitation - MultiLoad has some limitations.

  • Unique Secondary Indexes are not supported on a Target Table: Like FastLoad, MultiLoad does not support Unique Secondary Indexes (USIs). But unlike FastLoad, it does support the use of Non-Unique Secondary Indexes (NUSIs) because the index subtable row is on the same AMP as the data row.
  • Referential Integrity is not supported: The Referential Integrity defined on a table would take more system checking to prevent referential constraints.
  • Triggers are not supported at load time: Disable all the Triggers prior to using it.
  • No concatenation of input files is allowed: It could impact are restart if the files were concatenated in a different sequence or data was deleted between runs.
  • No Join Indexes: All the join indexes must be dropped before running a MultiLoad and then recreate them after the load is completed.
  • Will not process aggregates, arithmetic functions or exponentiation: If you need data conversions or math, you might be better off using an INMOD to prepare the data prior to loading it.
  • Target table: Target tables can have data. MultiLoad can load the data where the target table already loaded.

How MultiLoad Works - MultiLoad import has five IMPORT phases:

  • Phase 1 − Preliminary Phase – It’s the Basic setup phase. It’s used for several preliminary set-up activities for a successful data load.
  • Phase 2 − DML Transacon Phase – Verifies the syntax of DML statements and brings them to the Teradata system. All the SQL Data Manipulation Language (DML) statements are sent to the Teradata database as MultiLoad supports multiple DML functions.
  • Phase 3 − Acquisition Phase – Once the setup completes the PE's plan stored on each AMP. Then, Locks, the table headers and the actual input data will also be stored in the worktable. Brings the input data into worktables and locks the table.
  • Phase 4 − Applicaon Phase – In this phase, all DML operations are applied on target tables.
  • Phase 5 − Cleanup Phase – Table locks will be released and all the intermediate work tables will be dropped.

TPUMP is a shortened name for Teradata Parallel Data Pump. As learned FastLoad and MultiLoad are loads of huge volumes of data. But TPump loads data one row at a time, using row hash locks. Because it locks at this level, and not at the table level like MultiLoad, TPUMP can make many simultaneous, or concurrent, updates on a table. TPump performs Inserts, Updates, Deletes, and Upserts from Flat filed to populated Teradata tables at ROW LEVEL.

TPump supports:

  • Secondary Indexes
  • Referential Integrity
  • Triggers
  • Join indexes
  • Pumpdata in at varying rates.

Limitations

  • No concatenation of input data files is allowed.
  • TPump will not process aggregates, arithmetic functions or exponentiation.
  • The use of the SELECT function is not allowed.
  • No more than four IMPORT commands may be used in a single load task.
  • Dates before 1900 or after 1999 must be represented by the yyyy format for the year portion of the date, not the default format of yy.
  • On some network-attached systems, the maximum file size when using TPump is 2GB.
  • TPump performance will be diminished if Access Logging is used.

TPump supports One Error Table. The error table does the following:

  • Identifies errors.
  • Provides some detail about the errors
  • Stores a portion the actual offending row for debugging

Like the other utilities, a TPump script is fully restartable as long as the log table and error tables are not dropped.