Very bad performance on Non-Equi Join on Teradata

27 Views Asked by At

Ok, we have one table with transaction data (TRANSACTIONS) and one of the fields is the card number used on it. I have to get the country of the card by accessing the BIC_COUNTRY_RANGES table , where we have stored ranges of card numbers and the country of them. So... we have to join both tables searching the card number on ranges of numbers (along an additional field that it is the card type). On the TRANSACTIONS table we have around 450k rows per day, and in BIC_COUNTRY_RANGES, 170K static rows.

TRANSACTIONS

OPERATION_ID CARD_TYPE CARD_NUMBER
1234 A 411389999000000001
5678 B 451716303000000001

BIC_COUNTRY_RANGES

CARD_TYPE RANGE_START RANGE_END COUNTRY_ISO
A 411389999000000000 411389999999999999 US
B 451716303000000000 451716303999999999 AR

The join takes around 30 min to complete just with data of one day, and we have to run it with one-month data.

We have indexes created on CARD_TYPE and CARD_NUMBER on TRANSACTIONS, and CARD_TYPE, RANGE_START, RANGE_END on BIC_COUNTRY_RANGES, and the query used to join them is as easy as

SELECT *
FROM TRANSACTIONS T
LEFT JOIN
BIC_COUNTRY_CODES B
ON B.RANGE_START <= T.CARD_NUMBER AND
B.RANGE_END >= T.CARD_NUMBER AND
B.CARD_TYPE = T.CARD_TYPE;

¿Any idea why it takes so much to complete? We managed to reduce the number of rows of the BIC table from 1 million to 168k rows merging ranges, and if we replace non-equi join to a equijoin (just for testing), it takes seconds. So... its something related to the range of numbers but we cant figure out what is the problem. We have checked the ranged and doesn't seems to have overlapping ranges.

Find the EXPLAIN of the query here


  1) First, we lock TRANSACTIONS in TD_MAP1 for read on a
     reserved RowHash to prevent global deadlock.
  2) Next, we lock BIC_COUNTRY_CODES in TD_MAP1 for read on a reserved
     RowHash to prevent global deadlock.
  3) We lock TRANSACTIONS in TD_MAP1 for read, and we lock
     BIC_COUNTRY_CODES in TD_MAP1 for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step in TD_MAP1 from BIC_COUNTRY_CODES
          by way of an all-rows scan with a condition of ("NOT
          (BIC_COUNTRY_CODES.CARD_TYPE IS NULL)") into Spool 2
          (all_amps), which is redistributed by the hash code of (
          BIC_COUNTRY_CODES.CARD_TYPE) to all AMPs in TD_Map1.  Then
          we do a SORT to order Spool 2 by row hash.  The size of Spool
          2 is estimated with low confidence to be 167,424 rows (
          9,208,320 bytes).  The estimated time for this step is 0.02
          seconds.
       2) We do an all-AMPs RETRIEVE step in TD_MAP1 from
          TRANSACTIONS by way of an all-rows scan with no
          residual conditions into Spool 3 (all_amps), which is
          redistributed by the hash code of (
          TRANSACTIONS.CARD_TYPE) to all AMPs in TD_Map1.
          Then we do a SORT to order Spool 3 by row hash.  The size of
          Spool 3 is estimated with low confidence to be 475,776 rows (
          16,176,384 bytes).  The estimated time for this step is 0.03
          seconds.
  5) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
     way of a RowHash match scan, which is joined to Spool 3 (Last Use)
     by way of a RowHash match scan.  Spool 2 and Spool 3 are
     right outer joined using a merge join, with condition(s) used for
     non-matching on right table ("NOT (CARD_TYPE IS NULL)"),
     with a join condition of ("(RANGE_START <= CARD_NUMBER) AND
     ((RANGE_END >= CARD_NUMBER) AND (CARD_TYPE =
     CARD_TYPE ))").  The result goes into Spool 1
     (group_amps), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with no confidence to be 8,642,638 rows (
     777,837,420 bytes).  The estimated time for this step is 0.12
     seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.15 seconds.

Thanks

0

There are 0 best solutions below