Why is the Postgres "mode" function so different from "avg", "max", and other aggregates?

846 Views Asked by At

In Postgres, I can say select avg(size) from images and select max(size) from images.

But when I want the mode, I may not do this:

select mode(uploaded_by_id) from images

Instead I must do this:

select mode() within group (order by uploaded_by_id desc) from images

The syntax seems a little funky to me. Does anyone know why the other syntax was not permitted?

NOTE: I know that allowing order by enables the user to define which mode to take in the case of a tie, but I don't see why that needs to prohibit the other syntax entirely.

Thanks!

2

There are 2 best solutions below

0
jjanes On BEST ANSWER

There is no "machine formula" for computing the mode the way there are for those other things. For the min or max, you just track of the min or max seen so far. For average, you can just keep track of the sum and count seen so far, for example. With the mode, you need to have all the data at your fingertips.

Using an ordered-set aggregate provides for such a use case automatically, including spooling the data to temp files on disk as it becomes large.

You could instead write code to aggregate the data into memory and then process it from there (as the other answer references), but this would become slow and prone to crashing as the amount of memory needed starts to exceed the amount available.

0
Connor Willoughby On

After looking at the documentation it appears as though they moved away from a simple function in favour of the window function, theyre citing speed advantages as a reason for this.

https://wiki.postgresql.org/wiki/Aggregate_Mode

If you wanted to you could just create a function yourself but it seems as though the window function is the fastest way to get a NOT NULL result back from the db.