This is less of a problem and more of a question. I wish to understand why the optimizer creates a NOT(COLUMN_NAME IS NULL) condition for a column that is NOT NULL.
I have checked the query on two separate systems and even if the overall explain was different, the condition was still there.
Here's the SQL for creating a sample case:
CREATE VOLATILE TABLE Test (
SomeInteger INTEGER NOT NULL,
SomeOtherInteger INTEGER NOT NULL,
SomeDate DATE FORMAT 'YYYY-MM-DD' NOT NULL,
SomeValue VARCHAR(255) NOT NULL)
PRIMARY INDEX (SomeInteger, SomeOtherInteger)
ON COMMIT PRESERVE ROWS;
INSERT INTO Test VALUES (1,1,'2022-11-08','VBAZVVVZDVB');
SELECT
MAIN.SomeInteger,
MAIN.SomeOtherInteger,
MAIN.SomeDate,
MAIN.SomeValue,
Coalesce(SJOIN.SomeIdentifier, 'Y') AS SomeIdentifier
FROM
Test MAIN
LEFT OUTER JOIN
(SELECT
SomeInteger,
SomeOtherInteger,
'X' SomeIdentifier
FROM
(SELECT DISTINCT
SomeInteger,
SomeOtherInteger,
SomeValue
FROM
TEST) AS SUBQ
HAVING
Count(*) > 1
GROUP BY SomeInteger, SomeOtherInteger) AS SJOIN
ON SJOIN.SomeInteger = MAIN.SomeInteger
AND SJOIN.SomeOtherInteger = MAIN.SomeOtherInteger;
Here's the EXPLAIN plan (TD Vantage Express 17.20):
Explanation
------------------------------------------------------------------------
1) First, we do an all-AMPs SUM step in TD_DATADICTIONARYMAP to
aggregate from DBC.TEST by way of an all-rows scan with no
residual conditions, grouping by field1 (DBC.TEST.SomeInteger
,DBC.TEST.SomeOtherInteger ,DBC.TEST.SomeValue). Aggregate
intermediate results are computed locally, then placed in Spool 5
in TD_DataDictionaryMap. The size of Spool 5 is estimated with
high confidence to be 1 row (293 bytes). The estimated time for
this step is 0.00 seconds.
2) Next, we do an all-AMPs RETRIEVE step in TD_DataDictionaryMap from
Spool 5 (Last Use) by way of an all-rows scan into Spool 1 (used
to materialize view, derived table, table function or table
operator SUBQ) (all_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with high confidence to be 1 row
(29 bytes). The estimated time for this step is 0.00 seconds.
3) We do an all-AMPs SUM step in TD_DataDictionaryMap to aggregate
from Spool 1 (Last Use) by way of an all-rows scan with a
condition of ("(NOT (SUBQ.SOMEOTHERINTEGER IS NULL )) AND (NOT
(SUBQ.SOMEINTEGER IS NULL ))"), grouping by field1 (
DBC.TEST.SOMEINTEGER ,DBC.TEST.SOMEOTHERINTEGER). Aggregate
intermediate results are computed locally, then placed in Spool 8
in TD_DataDictionaryMap. The size of Spool 8 is estimated with
low confidence to be 1 row (37 bytes). The estimated time for
this step is 0.00 seconds.
[..]
Even though all of the columns in the table are marked as NOT NULL and in no point are any additional rows created that could contain NULL values, the condition is still generated for columns that are included in the JOIN condition. Why is that?