Let's say I have the following PostgreSQL table called products:
CREATE TABLE IF NOT EXISTS mytable (
id serial NOT NULL PRIMARY KEY,
label VARCHAR(50) NOT NULL,
info jsonb NOT NULL,
created_at timestamp NOT NULL DEFAULT now()
);
Here is some test data. Note my actual table has millions of records.
INSERT INTO products (label, info) VALUES ('a', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('a', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('c', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('c', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('b', '[1, 2, 3]');
I want to write a query that grabs distinct labels and orders the records by the created_at field. My first instinct would be to write the following query:
SELECT DISTINCT ON (label) * FROM products ORDER BY created_at DESC;
However, this fails with the following error:
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Looks like I can fix this using SQL subqueries:
SELECT * FROM (
SELECT DISTINCT ON (label) * FROM products
) AS subquery ORDER BY created_at DESC;
which generates the following expected result:
id | label | info | created_at
----+-------+-----------+----------------------------
5 | b | [1, 2, 3] | 2022-11-14 03:32:23.245669
3 | c | [1, 2, 3] | 2022-11-14 03:32:23.242813
1 | a | [1, 2, 3] | 2022-11-14 03:32:23.239791
Is this the best way to fix this issue? Or is there a faster way to query this data? Note, I mention above how my actual table has millions of records so I want to come up with the best query possible.
Leading
ORDER BYexpressions must matchDISTINCT ONexpressions (and vice versa):See:
It's unclear whether you want
created_at ASCorcreated_at DESC: you show the latter, but the results you call "expected" match the former - because your "fix" does not do what you seem to think it's doing.To sort resulting (distinct) rows by
created_at DESC, you have to run an outerSELECTwith a different sort order:See:
For a big table, be sure to have an index on
(label, created_at)or(label, created_at DESC), respectively.There may be (much) faster solutions depending on undisclosed details. Most importantly:
SELECT *, or isSELECT label, created_atall you need?