Lateral Join returns too many rows

45 Views Asked by At

I try to understand LATERAL JOIN with this query:

select m1.contributor_prescription, derniere_publication.date_publication_prescription 
from activite.metadonnee m1
left join lateral 
(select date_publication_prescription, m2.contributor_prescription 
from activite.metadonnee m2 
where m1.contributor_prescription = m2.contributor_prescription 
order by date_publication_prescription desc
limit 3) as derniere_publication
ON true
order by m1.contributor_prescription

The goal is to have for each contributor_prescription the 3 latest date_publication_prescription.

For now, the result is all contributor_prescription rows and the last date_publication_prescription for all.

contributor_prescription date_publication_prescription
john 22/10/2023
john 22/10/2023
(as many times as john exists in the table)
paul 24/10/2023
paul 24/10/2023
(as many times as paul exists in the table)

expected:

contributor_prescription date_publication_prescription
john 22/10/2023
john 21/10/2023
john 20/10/2023
paul 24/10/2023
paul 23/10/2023
paul 21/10/2023
1

There are 1 best solutions below

2
Zegarek On BEST ANSWER

If you want the top 3 most recent per contributor_prescription you can order them with row_number(), then keep only the top 3 using a plain where: demo

with cte as (
    select contributor_prescription, 
           date_publication_prescription,
           row_number()over w1 as rn
    from activite.metadonnee m1
    window w1 as (partition by contributor_prescription
                      order by date_publication_prescription desc) )
select contributor_prescription,
       date_publication_prescription
from cte 
where rn<=3
order by 1,2 desc;
contributor_prescription date_publication_prescription
john 2023-10-22
john 2023-10-21
john 2023-10-20
paul 2023-10-24
paul 2023-10-23
paul 2023-10-21

Or you could do what you did, but don't join straight to the source table, but to distinct contributor_prescription:

select m1.contributor_prescription, 
       derniere_publication.date_publication_prescription 
from (select distinct contributor_prescription from activite.metadonnee) m1
left join lateral 
(   select  date_publication_prescription, 
            m2.contributor_prescription 
    from activite.metadonnee m2 
    where m1.contributor_prescription = m2.contributor_prescription 
    order by date_publication_prescription desc
    limit 3) as derniere_publication
ON true
order by m1.contributor_prescription

Otherwise you're asking every single row to show you 3 most recent rows with the same contributor_prescription, needlessly multiplying everything. Note the performance of that isn't great.

There's also a very brief method of collecting into an array, taking a slice and unnesting that:

select contributor_prescription, unnest(arr)
from (select contributor_prescription, 
            (array_agg(date_publication_prescription 
                       order by date_publication_prescription desc))[:3] arr
      from activite.metadonnee
      group by 1) a;

But the most performant would be the infamous (still emulated) index skip scan: demo

WITH RECURSIVE t AS (
   SELECT contributor_prescription,
          1 as rank_pos,
          max(date_publication_prescription) AS date_publication_prescription
   FROM activite.metadonnee 
   GROUP BY contributor_prescription
   UNION ALL
   SELECT contributor_prescription,
          rank_pos+1,
          (SELECT max(date_publication_prescription) 
           FROM activite.metadonnee 
           WHERE date_publication_prescription < t.date_publication_prescription
           AND contributor_prescription=t.contributor_prescription)
   FROM t 
   WHERE rank_pos<3
   )
SELECT contributor_prescription, date_publication_prescription
FROM t;