How to join two tables by dependent match keys in BigQuery?

221 Views Asked by At

I have two tables in BigQuery First one is a list of rates. Rates have default values with source equal -1 for each combo code - offer. Apart from combo code - offer, some rates have specified source

Second table has same columns as first table except rates + any other data.

My goal join rates by matched code - offer - source otherwise use default rate by matched code - offer with source equal -1

In example query returns default rates only:

    WITH t1 AS (SELECT 21 as source, 'SA' as code, 'offer1' as offer, 2.4 as rate 
            UNION ALL
            SELECT 33, 'SA', 'offer1', 2.5
            UNION ALL
            SELECT 39, 'SA', 'offer1', 2.1
            UNION ALL
            SELECT -1, 'SA', 'offer1', 3
            UNION ALL
            SELECT -1, 'SA', 'offer2', 4
            UNION ALL
            SELECT 47, 'YN', 'offer1', 2.7
            UNION ALL
            SELECT -1, 'YN', 'offer1', 5.4
            UNION ALL
            SELECT -1, 'YN', 'offer2', 0.9
            UNION ALL
            SELECT -1, 'RE', 'offer1', 5.7
            UNION ALL
            SELECT -1, 'RE', 'offer2', 3.4),
t2 as (SELECT 21 as source, 'SA' as code, 'offer1' as offer, "any data" as other_columns
        UNION ALL SELECT 21, 'SA', 'offer1', "any data"
        UNION ALL SELECT 21, 'SA', 'offer1', "any data"
        UNION ALL SELECT 21, 'SA', 'offer2', "any data"
        UNION ALL SELECT 47, 'YN', 'offer1', "any data"
        UNION ALL SELECT 47, 'YN', 'offer2', "any data"
        UNION ALL SELECT 50, 'YN', 'offer1', "any data"
        UNION ALL SELECT 47, 'YN', 'offer2', "any data"
        UNION ALL SELECT 78, 'RE', 'offer1', "any data"
        UNION ALL SELECT 66, 'RE', 'offer2', "any data")
        
        
SELECT t2.*, rate FROM t2
LEFT JOIN t1 ON t1.offer = t2.offer AND t1.code = t2.code AND IF (t1.source = t1.source AND rate IS NULL, t1.source = t2.source, t1.source = - 1)

Next query returns rates with specified source and null when source did not match

SELECT t2.*, rate FROM t2
    LEFT JOIN t1 ON t1.offer = t2.offer AND t1.code = t2.code AND IF (t1.source = t1.source AND rate IS NOT NULL, t1.source = t2.source, t1.source = - 1) 

How can I join rates correct?

2

There are 2 best solutions below

1
On BEST ANSWER

You can left join twice and use conditional logic:

select t2.*, coalesce(t11.rate, t12.rate) rate
from t2
left join t1 t11
    on  t11.code = t2.code
    and t11.offer = t2.offer 
    and t11.source = t2.source
left join t1 t12 
    on  t12.code = t2.code
    and t12.offer = t2.offer
    and t12.source = -1
    and t11.code is null
1
On

Below is for BigQuery Standard SQL

#standardSQL
select any_value(t2).*, 
  array_agg(rate order by t1.source = t2.source desc, t1.source = -1 desc limit 1)[offset(0)] rate
from t2
left join t1 
on  t1.code = t2.code
and t1.offer = t2.offer 
group by format('%t', t2)   

if applied to sample data from your question - output is as below

enter image description here

Above avoids double joining, the only side effect here is - result is deduped - meaning duplicate rows - which are present in the table 2 - are deduped / eliminated

I need duplicate rows

Sure, just almost no changes to above gives you all rows

#standardSQL
select any_value(t2).*, 
  array_agg(rate order by t1.source = t2.source desc, t1.source = -1 desc limit 1)[offset(0)] rate
from t2, unnest([rand()]) as r 
left join t1 
on  t1.code = t2.code
and t1.offer = t2.offer 
group by format('%t', t2), r   

with output

enter image description here