Teradata optimizer generating an IS NOT NULL check for NOT NULL columns

66 Views Asked by At

This is less of a problem and more of a question. I wish to understand why the optimizer creates a NOT(COLUMN_NAME IS NULL) condition for a column that is NOT NULL.

I have checked the query on two separate systems and even if the overall explain was different, the condition was still there.

Here's the SQL for creating a sample case:

CREATE VOLATILE TABLE Test (
    SomeInteger INTEGER NOT NULL,
    SomeOtherInteger INTEGER NOT NULL,
    SomeDate DATE FORMAT 'YYYY-MM-DD' NOT NULL,
    SomeValue VARCHAR(255) NOT NULL)
PRIMARY INDEX (SomeInteger, SomeOtherInteger)
ON COMMIT PRESERVE ROWS;

INSERT INTO Test VALUES (1,1,'2022-11-08','VBAZVVVZDVB');

SELECT
    MAIN.SomeInteger,
    MAIN.SomeOtherInteger,
    MAIN.SomeDate,
    MAIN.SomeValue,
    Coalesce(SJOIN.SomeIdentifier, 'Y') AS SomeIdentifier
FROM
    Test MAIN
    
    LEFT OUTER JOIN
        (SELECT
            SomeInteger,
            SomeOtherInteger,
            'X' SomeIdentifier
        FROM
            (SELECT DISTINCT
                SomeInteger,
                SomeOtherInteger,
                SomeValue
            FROM
                TEST) AS SUBQ
        HAVING
            Count(*) > 1
        GROUP BY SomeInteger, SomeOtherInteger) AS SJOIN
    ON SJOIN.SomeInteger = MAIN.SomeInteger
    AND SJOIN.SomeOtherInteger = MAIN.SomeOtherInteger;

Here's the EXPLAIN plan (TD Vantage Express 17.20):

 Explanation                                                              
 ------------------------------------------------------------------------ 
   1) First, we do an all-AMPs SUM step in TD_DATADICTIONARYMAP to       
      aggregate from DBC.TEST by way of an all-rows scan with no         
      residual conditions, grouping by field1 (DBC.TEST.SomeInteger      
      ,DBC.TEST.SomeOtherInteger ,DBC.TEST.SomeValue).  Aggregate        
      intermediate results are computed locally, then placed in Spool 5  
      in TD_DataDictionaryMap.  The size of Spool 5 is estimated with    
      high confidence to be 1 row (293 bytes).  The estimated time for   
      this step is 0.00 seconds.                                         
   2) Next, we do an all-AMPs RETRIEVE step in TD_DataDictionaryMap from 
      Spool 5 (Last Use) by way of an all-rows scan into Spool 1 (used   
      to materialize view, derived table, table function or table        
      operator SUBQ) (all_amps), which is built locally on the AMPs.     
      The size of Spool 1 is estimated with high confidence to be 1 row  
      (29 bytes).  The estimated time for this step is 0.00 seconds.     
   3) We do an all-AMPs SUM step in TD_DataDictionaryMap to aggregate    
      from Spool 1 (Last Use) by way of an all-rows scan with a          
      condition of ("(NOT (SUBQ.SOMEOTHERINTEGER IS NULL )) AND (NOT     
      (SUBQ.SOMEINTEGER IS NULL ))"), grouping by field1 (               
      DBC.TEST.SOMEINTEGER ,DBC.TEST.SOMEOTHERINTEGER).  Aggregate       
      intermediate results are computed locally, then placed in Spool 8  
      in TD_DataDictionaryMap.  The size of Spool 8 is estimated with    
      low confidence to be 1 row (37 bytes).  The estimated time for     
      this step is 0.00 seconds.                                            
[..]

Even though all of the columns in the table are marked as NOT NULL and in no point are any additional rows created that could contain NULL values, the condition is still generated for columns that are included in the JOIN condition. Why is that?

0

There are 0 best solutions below