Suppose the following:
create schema bv;
create table bv.user(id bigint primary key);
create table bv.user_photo (
id bigint primary key,
url varchar(255) not null,
user_id bigint references bv.user(id)
);
insert into bv.user values (100), (101);
insert into bv.user_photo values
(1, 'https://1.com', 100),
(3, 'https://3.com', 100),
(4, 'https://4.com', 101),
(2, 'https://2.com', 100);
I'd like to query for and build an object for every user, and include only the latest image in the result.
Here's what I have:
select
json_build_object(
'id', u.id,
'latest_image', up.url
) user
from bv.user u
left join bv.user_photo up
on u.id = up.user_id
However this returns:
[
{"id" : 100, "url" : "https://2.com"},
{"id" : 100, "url" : "https://3.com"},
{"id" : 100, "url" : "https://1.com"},
{"id" : 101, "url" : "https://4.com"}
]
However, the expected result is:
[
{"id" : 100, "url" : "https://3.com"},
{"id" : 101, "url" : "https://4.com"}
]
I've tried using distinct:
select distinct on(u.id)
json_build_object(
'id', u.id,
'url', up.url
) user
from bv.user u
left join bv.user_photo up
on u.id = up.user_id
order by u.id, up.id DESC
But my question is whether or not this is the correct approach? I feel like I shouldn't be using distinct in such a situation.
With few photos per user, and while you return all (or most) users,
DISTINCT ONis the best approach.But it's typically faster to get distinct photos before you join:
For many photos per user, an emulated index-skip scan is (much) faster. See:
Summary array
To produce one summary array, you can skip
json_build_object().json_agg()can aggregate the row directly:Notably, all queries so far include the key "url" with a null value where no photo is found. You may want to strip the noise:
For a small selection of users
This seems to be your use case.
A
LATERALsubquery is typically faster. Also deals with many photos per user efficiently!Summary array for a small selection, skipping null values
fiddle
Index
DISTINCT ONdoes not need an index. All other queries absolutely need an index onuser_photo(user_id, id). Or even, ideally:Aside: Don't use the reserved word "user" as identifier. It works while schema-qualified, but fails without.