Teradata SQL join with approximate match by nearest Neighbor

129 Views Asked by At

I have two tables table1 and table2 I need to join these tables with keys that are not exactly matched, they could differ in the last 3 or 4 digits there is no fixed pattern, the joining will have two conditions:

  • 1st Condition: table2.Key_2 must be numerically greater than table1.Key_1 so join will will search for nearest large Key
  • 2nd Condition: table1 could have two keys approximately matched with one key in table2 so the join will consider the nearest large one the other return Null

|Key_1             |
|------------------|
|111330317223636588|
|121199074223629055|
|47256714523667238 |
|101348062023590858|
|106331320423644206|
|106061712623646625|
|120562195823631381|
|155570817823642550|
|114948476223640334|
|103285939423669298|
|103285939423669910|
|Key_2              |Value|
|-------------------+-----|
|111330317223636610 |Done |
|121199074223629090 |Done |
|47256714523667300  |Done |
|101348062023590900 |Done |
|106331320423644305 |Done |
|106061712623647100 |Done |
|120562195823631399 |Done |
|155570817823642677 |Done |
|114948476223640455 |Done |
|103285939423669988 |Done |

the expected result will be like the below:

|Key_1              |Value|
|-------------------+-----|
|111330317223636588 |Done |
|121199074223629055 |Done |
|47256714523667238  |Done |
|101348062023590858 |Done |
|106331320423644206 |Done |
|106061712623646625 |Done |
|120562195823631381 |Done |
|155570817823642550 |Done |
|114948476223640334 |Done |
|103285939423669298 |NULL | 
|103285939423669910 |Done |
1

There are 1 best solutions below

2
On BEST ANSWER

This join on the next value is hard to express as a join, but simple using this approach:

with cte as
 ( -- combine both key columns into one 
   select key_1, cast(null as varchar(10)) as val from table1
   union all
   select key_2 as key_1, "value" from table2
  -- order by 1
 )
select 
   key_1
   -- return the next rows value
  ,lead(val)
   over (order by key_1) as "value"
  ,val
from cte
-- filter only rows from the first table
qualify val is null

The cte combines both tables:

 Key_1              val <-- NULLs indicate rows from table_1
 ------------------ ----- 
  47256714523667238 NULL 
  47256714523667300 Done  
 101348062023590858 NULL  
 101348062023590900 Done  
 103285939423669298 NULL
 103285939423669910 NULL  
 103285939423669988 Done  
 106061712623646625 NULL  
 106061712623647100 Done  
 106331320423644206 NULL  
 106331320423644305 Done  
 111330317223636588 NULL  
 111330317223636610 Done  
 114948476223640334 NULL  
 114948476223640455 Done  
 120562195823631381 NULL  
 120562195823631399 Done  
 121199074223629055 NULL  
 121199074223629090 Done  
 155570817823642550 NULL  
 155570817823642677 Done  

LEAD finds the next row's value:

 Key_1              val  value 
 ------------------ ---- ----- 
  47256714523667238 NULL Done 
  47256714523667300 Done NULL 
 101348062023590858 NULL Done 
 101348062023590900 Done NULL 
 103285939423669298 NULL NULL 
 103285939423669910 NULL Done 
 103285939423669988 Done NULL 
 106061712623646625 NULL Done 
 106061712623647100 Done NULL 
 106331320423644206 NULL Done 
 106331320423644305 Done NULL 
 111330317223636588 NULL Done 
 111330317223636610 Done NULL 
 114948476223640334 NULL Done 
 114948476223640455 Done NULL 
 120562195823631381 NULL Done 
 120562195823631399 Done NULL 
 121199074223629055 NULL Done 
 121199074223629090 Done NULL 
 155570817823642550 NULL Done 
 155570817823642677 Done NULL 

and the final QUALIFY removes all rows from the 2nd table

 Key_1              value 
 ------------------ ----- 
  47256714523667238 Done 
 101348062023590858 Done 
 103285939423669298 NULL 
 103285939423669910 Done 
 106061712623646625 Done 
 106331320423644206 Done 
 111330317223636588 Done 
 114948476223640334 Done 
 120562195823631381 Done 
 121199074223629055 Done 
 155570817823642550 Done