Teradata - Derived table

328 Views Asked by At

The below query consumes more than 1 TB of spool space as the Col11 column has more than 5 million values.

SELECT A.COL1, B.COL2,
FROM TAB1 A 
JOIN TAB2  B 
ON B.COL2= A.COL3 AND B.COL4= A.COL4 AND B.COL6= 'XYZ'
JOIN TAB3 D 
ON D.COL5= A.COL5 AND D.COL4= A.COL4 AND D.COL6= 'EFG'
JOIN VIEW1 C 
ON A.COL9= C.COL9 AND A.COL4= C.COL4 AND C.COL6='EFG' AND A.COL7= C.COL8
JOIN Vpd VPRD ON VPRD.COL10=D.COL10
WHERE A.COL4= 2017
AND ((C.COL8 IN('13')) OR (((C.COL8 IN('01', '1B', '8E')))))
AND ( ((B.COL11= 'ABC')))
AND ( ( A.COL1 BETWEEN '2017/07/01' AND '2017/08/01') 
OR ( A.COL1 BETWEEN '2016/07/01' AND '2016/09/01') )
GROUP BY 1,2;

Its explain plan is as below:

6) We do an all-AMPs RETRIEVE step from VPRD by way of an
    all-rows scan with a condition of ("NOT
(VPRD.COL10 IS NULL)") into Spool 4 (all_amps)
(compressed columns allowed), which is duplicated on all AMPs.
The size of Spool 4 is estimated with high confidence to be 2,160
rows (45,360 bytes).  The estimated time for this step is 0.01
seconds.
7) We do an all-AMPs JOIN step from a single partition of
TAB2 by way of index # 8 "TAB2.COL4 = 2017,
TAB2.COL6 = 'XYZ ', TAB2.COL11 = 'ABC'" with no residual
conditions, which is joined to Spool 4 (Last Use) by way of an
all-rows scan.  TAB2 and Spool 4 are joined using a
product join, with a join condition of ("(1=1)").  The result goes
into Spool 5 (all_amps) (compressed columns allowed), which is
duplicated on all AMPs.  Then we do a SORT to order Spool 5 by the
hash code of (TAB2.COL4,VPRD.COL10, 'EFG').  The size of Spool 5
is estimated with high confidence to be 60,480 rows (1,814,400
bytes).  The estimated time for this step is 0.01 seconds.
8) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of
an all-rows scan, which is joined to TAB3 by way of
a traversal of index # 16 without accessing the base table
extracting row ids only.  Spool 5 and TAB3 are
joined using a nested join, with a join condition of (
"(COL4 = TAB3.COL4) AND ((COL10 = TAB3.COL10) AND
(TAB3.COL6 = ('EFG' )))").  The result
goes into Spool 6 (all_amps), which is built locally on the
AMPs.  Then we do a SORT to order Spool 6 by field Id 1.  The
size of Spool 6 is estimated with low confidence to be
162,567 rows (6,340,113 bytes).  The estimated time for this
step is 0.01 seconds.
2) We do an all-AMPs RETRIEVE step from
TAB3 by way of an all-rows scan
with a condition of  "(TAB3.COL4 = 2017) AND
((TAB3.COL6 =   'EFG') AND((TAB3.COL8 IN
('13','01','1B','8E')) AND  (TAB3.HRCY_LVL_ID = 'TERR')))")
into Spool 7 (all_amps) (compressed columns allowed), which
is duplicated on all AMPs.  Spool 7 is built as in-memory
optimized spool with 4 column partitions.  The size of Spool
7 is estimated with low confidence to be 187,380 rows (
8,994,240 bytes).  The estimated time for this step is 0.01
seconds.
9) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
all-rows scan, which is joined to a single partition of
TAB3 with a condition of ("(TAB3.COL4 =
2017) AND (TAB3.COL6 = 'EFG')").  Spool 6
and TAB3 are joined using a row id join, with a join
condition of ("COL10 = TAB3.COL10").
The result goes into Spool 8 (all_amps) (compressed columns
allowed), which is redistributed by the rowkey of (
TAB3.COL5)), TAB2.COL2) to all
AMPs.  Spool 8 is built as in-memory optimized spool with 4 column
partitions.  The size of Spool 8 is estimated with low confidence
to be 162,567 rows (5,364,711 bytes).  The estimated time for this
step is 0.04 seconds.
10) We do an all-AMPs JOIN step from Spool 7 (Last Use), which is
joined to 5 partitions of TAB1 with a condition of (
"(TAB1.COL4 = 2017) AND
((TAB1.COL1 IN (DATE '2016-07-01'TO DATE
'2016-09-01',DATE '2017-07-01'TO DATE '2017-08-01')) AND
(TAB1.COL7 IN ('13','8E','1B','01')))").
Spool 7 and TAB1 are joined using a in-memory dynamic
hash join, with a join condition of (
"(TAB1.COL9 = COL9) AND ((TAB1.COL4 = COL4) AND
(TAB1.COL7 = COL8 ))").  The result
goes into Spool 9 (all_amps) (compressed columns allowed), which
is redistributed by the rowkey of (TAB1.COL5)),
TAB1.COL3) to all AMPs.  Spool 9 is built as
in-memory optimized spool with 4 column partitions.  The size of
Spool 9 is estimated with low confidence to be 7,493,759 rows (
277,269,083 bytes).  The estimated time for this step is 0.45
seconds.
11) We do an all-AMPs JOIN step from Spool 8 (Last Use), which is
joined to Spool 9 (Last Use).  Spool 8 and Spool 9 are joined
using a single partition in-memory hash join, with a join
condition of ("(COL5 = COL5) AND (((COL4 =
COL4) AND ((COL4 = COL4) AND ((COL4 =
COL4) AND (COL4 = COL4 )))) AND (COL2 =
COL3 ))").  The result goes into Spool 3 (all_amps)
(compressed columns allowed), which is built locally on the AMPs.
The size of Spool 3 is estimated with low confidence to be 385,641
rows (8,869,743 bytes).  The estimated time for this step is 0.03
seconds.
12) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
way of an all-rows scan , grouping by field1 (
TAB1.COL1 ,TAB2.COL2).
Aggregate Intermediate Results are computed globally, then placed
in Spool 1.  The size of Spool 1 is estimated with low confidence
to be 169 rows (6,253 bytes).  The estimated time for this step is
0.02 seconds.
13) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.  The total estimated time is 0.57 seconds.

If I format the above query in the below fashion, I am getting the output with very less spool space

SELECT * FROM(
SELECT A.COL1,
CASE WHEN (B.COL11= 'ABC') THEN B.COL2 
ELSE NULL END AS COL2
FROM TAB1 A 
JOIN TAB2  B 
ON B.COL2= A.COL3 AND B.COL4= A.COL4 AND B.COL6= 'XYZ'
JOIN TAB3 D 
ON D.COL5= A.COL5 AND D.COL4= A.COL4 AND D.COL6= 'EFG'
JOIN VIEW1 C 
ON A.COL9= C.COL9 AND A.COL4= C.COL4 AND C.COL6='EFG' AND A.COL7= C.COL8
JOIN Vpd VPRD ON VPRD.COL10=D.COL10
WHERE A.COL4= 2017
AND ((C.COL8 IN('13')) OR (((C.COL8 IN('01', '1B', '8E')))))
AND ( ((B.COL11= 'ABC')))
AND ( ( A.COL1 BETWEEN '2017/07/01' AND '2017/08/01') 
OR ( A.COL1 BETWEEN '2016/07/01' AND '2016/09/01') )
GROUP BY 1,2)X
WHERE X.COL2 IS NOT NULL

Its explain plan is as below

6) We do an all-AMPs RETRIEVE step from VPRD by way of an
all-rows scan with a condition of ("NOT
(VPRD.COL10 IS NULL)") into Spool 4 (all_amps)
(compressed columns allowed), which is duplicated on all AMPs.
Then we do a SORT to order Spool 4 by the hash code of (
VPRD.COL10, 'EFG', 2017).  The size of
Spool 4 is estimated with high confidence to be 2,160 rows (
47,520 bytes).  The estimated time for this step is 0.00 seconds.
7) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of
an all-rows scan, which is joined to TAB3 by way of
a traversal of index # 16 without accessing the base table
extracting row ids only.  Spool 4 and TAB3 are
joined using a nested join, with a join condition of (
"(COL10 = TAB3.COL10) AND ((TAB3.COL4 = 2017) AND
(TAB3.COL6 = ('EFG' )))").  The result
goes into Spool 5 (all_amps), which is built locally on the
AMPs.  Then we do a SORT to order Spool 5 by field Id 1.  The
size of Spool 5 is estimated with no confidence to be 5,806
rows (179,986 bytes).  The estimated time for this step is
0.01 seconds.
2) We do an all-AMPs RETRIEVE step from
TAB3 by way of an all-rows
scan with a condition of  "(TAB3.COL4 = 2017) AND
((TAB3.COL6 =   'EFG') AND((TAB3.COL8 IN
('13','01','1B','8E')) AND  (TAB3.HRCY_LVL_ID = 'TERR')))")
into Spool 6 (all_amps) (compressed columns
allowed), which is duplicated on all AMPs.  Spool 6 is built
as in-memory optimized spool with 4 column partitions.  The
size of Spool 6 is estimated with low confidence to be
187,380 rows (8,994,240 bytes).  The estimated time for this
step is 0.01 seconds.
8) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of an
all-rows scan, which is joined to a single partition of
TAB3 with a condition of ("(TAB3.COL4 =
2017) AND (TAB3.COL6 = 'EFG')").  Spool 5
and TAB3 are joined using a row id join, with a join
condition of ("COL10 = TAB3.COL10").
The result goes into Spool 7 (all_amps) (compressed columns
allowed), which is duplicated on all AMPs.  Spool 7 is built as
in-memory optimized spool with 4 column partitions.  The size of
Spool 7 is estimated with no confidence to be 1,045,080 rows (
30,307,320 bytes).  The estimated time for this step is 0.08
seconds.
9) We do an all-AMPs JOIN step from Spool 6 (Last Use), which is
joined to 5 partitions of TAB1 with a condition of (
"(TAB1.COL4 = 2017) AND ((TAB1.COL1 IN (DATE '2016-07-01'TO DATE
'2016-09-01',DATE '2017-07-01'TO DATE '2017-08-01')) AND
(TAB1.COL7 IN ('13','8E','1B','01')))").
Spool 6 and TAB1 are joined using a in-memory dynamic
hash join, with a join condition of (
"(TAB1.COL9 = COL9) AND ((TAB1.COL4 = COL4) AND
(TAB1.COL7 = COL8 ))").  The result
goes into Spool 8 (all_amps) (compressed columns allowed), which
is built locally on the AMPs.  Spool 8 is built as in-memory
optimized spool with 4 column partitions.  The size of Spool 8 is
estimated with low confidence to be 7,493,759 rows (337,219,155
bytes).  The estimated time for this step is 0.28 seconds.
10) We do an all-AMPs JOIN step from Spool 7 (Last Use), which is
joined to Spool 8 (Last Use).  Spool 7 and Spool 8 are joined
using a single partition in-memory hash join, with a join
condition of ("(COL5 = COL5) AND ((COL4 =
COL4) AND (COL4 = COL4 ))").  The result goes into
Spool 9 (all_amps) (compressed columns allowed), which is built
locally on the AMPs.  Spool 9 is built as in-memory optimized
spool with 4 column partitions.  The size of Spool 9 is estimated
with no confidence to be 182,784 rows (6,763,008 bytes).  The
estimated time for this step is 0.03 seconds.
11) We do an all-AMPs RETRIEVE step from a single partition of
TAB2 with a condition of ("TAB2.COL4 = 2017")
with a residual condition of ("(TAB2.COL4 = 2017)
AND ((TAB2.COL6 = 'XYZ ') AND (NOT (( CASE
WHEN (TAB2.COL11 = 'ABC') THEN
(TAB2.COL2) ELSE (NULL) END )IS NULL )))") into
Spool 10 (all_amps) (compressed columns allowed), which is
duplicated on all AMPs.  Spool 10 is built as in-memory optimized
spool with 4 column partitions.  The size of Spool 10 is estimated
with no confidence to be 2,384,460 rows (88,225,020 bytes).  The
estimated time for this step is 0.09 seconds.
12) We do an all-AMPs JOIN step from Spool 9 (Last Use), which is
joined to Spool 10 (Last Use).  Spool 9 and Spool 10 are joined
using a single partition in-memory hash join, with a join
condition of ("(COL4 = COL4) AND (((COL4 = COL4) AND (COL4 = COL4 )) 
AND (COL2 = COL3 ))").  The result goes into Spool 3 (all_amps)
(compressed columns allowed), which is built locally on the AMPs.
The size of Spool 3 is estimated with no confidence to be 15,701
rows (486,731 bytes).  The estimated time for this step is 0.01
seconds.
13) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
way of an all-rows scan , grouping by field1 (
TAB1.COL1 ,( CASE WHEN (TAB2.COL11 = 'ABC') THEN
(TAB2.COL2) ELSE (NULL) END)).  Aggregate
Intermediate Results are computed globally, then placed in Spool 1.
The size of Spool 1 is estimated with no confidence to be 13,247
rows (490,139 bytes).  The estimated time for this step is 0.02
seconds.
14) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.  The total estimated time is 0.52 seconds.

Can someone please tell me the difference it creates while execution? How the derived table consumes less spool space when compared to the original query? Thanks

0

There are 0 best solutions below