BigQuery Code Unexpected results with code formatting sqlfluff

572 Views Asked by At

I have a perfect code that compares the data from one table with another (see below) which works totally fine and runs fine as well in BigQuery:

with source1 as (
select 
b.id, 
b.qty, 
a.price 
from <table> as a
,unnest <details> as b
 where b.status != 'canceled'
),

source2 as (
select id_, qty_, price_  from <table2>
where city != 'delhi'
) 

select *
from source1 s1
full outer join source2 s2
on id = id_
where format('%t', s1) != format('%t', s2)

However, the code above runs into an error in sqlfluff i.e a certain SQL formatting rules checker that I can't bypass or turn off, see the error from sqlfluff below:

ERROR FROM SQLFLUFF:

*'s1' found in select with more than one referenced table/view' and 's2' found in select with more than one referenced table/view

Does anybody know how I can fix it ?

2

There are 2 best solutions below

0
On

Windows function may perform better.

Instead of joining, the tables are unioned. Then a window function will search for combinations. As you tagged this question [Big-Query] it is tested for BigQuery:

with s1 as (
select id, qty, city from <table> where x != 'pending'
),

s2 as (
select id_, qty_, city_  from <table2>
),

concat_ as (
  select 1 as dummy, * , format('%t',s1) as dummy_all
  from s1
union all select 2 as dummy, * , format('%t',s2) as dummy_all
  from s2
)
,combine as (
  select *,
    sum(if(dummy=1,1,0)) over win1 = sum(if(dummy=2,1,0)) over win1 as dummy_flag
  from concat_
  window win1 as (partition by id,dummy_all)
  )

  Select * from combine
  where dummy_flag is false
0
On

I fixed the code by adding extra CTE and adjusting the where caluse this would not flag the sqlfluff rules:

To fix that, I tried the second code (see below):

  • added additional ctes
  • adjusted the where clause so that I can get all the rows where the id exists in one source but not in another and vice versa

the code seems to work but it would be great if someone could suggest how I can reduce the CTEs, considering the sqlfluff will not fail:

with s1 as (
select 
b.id, 
b.qty, 
a.price 
from <table> as a
,unnest <details> as b
 where b.status != 'canceled'
),

s2 as (
select id_, qty_, price_  from <table2>
where city != 'delhi'
),

,concat_s1 as (
  select
    *
    , format('%t',s1) as l1
  from s1
)

,concat_s2 as (

  select
    *
    , format('%t',s2) as l2
  from s2
)

, combined as (
  select
    source1.*
    ,source2.*
  from concat_s1 as source1
  full outer join concat_s2 as source2
    on source1.id_ = source2.id_
  where source1.l1 != source2.l2
    or source1.id is null or source2.id_ is null