In Postgres, I can say select avg(size) from images and select max(size) from images.
But when I want the mode, I may not do this:
select mode(uploaded_by_id) from images
Instead I must do this:
select mode() within group (order by uploaded_by_id desc) from images
The syntax seems a little funky to me. Does anyone know why the other syntax was not permitted?
NOTE: I know that allowing order by enables the user to define which mode to take in the case of a tie, but I don't see why that needs to prohibit the other syntax entirely.
Thanks!
There is no "machine formula" for computing the mode the way there are for those other things. For the min or max, you just track of the min or max seen so far. For average, you can just keep track of the sum and count seen so far, for example. With the mode, you need to have all the data at your fingertips.
Using an ordered-set aggregate provides for such a use case automatically, including spooling the data to temp files on disk as it becomes large.
You could instead write code to aggregate the data into memory and then process it from there (as the other answer references), but this would become slow and prone to crashing as the amount of memory needed starts to exceed the amount available.