Let's take the following two queries:
select greatest(1, val) from unnest([null]) as val
# null
And:
select sum(val) from unnest([1, null]) as val
# 1
My question is why is null handled differently between aggregate functions and normal scalar functions? Of course, the second query is more useful when null values are not counted. But I would think the first one would be more useful as well if it were to return 1 instead of null. If null is an 'unknown value', then wouldn't both functions have a supposedly unknown answer (i.e., null), not just the latter?
If there is a historical reason for this it would be great to know as well. Hopefully someone can shine some light on why there's a difference between the two.
The aggregate functions are supposed to skip null values. Here is an excerpt from SQL-92 standard:
The
greatestfunction on the other hand should return null if any argument is null¹.Generally speaking, scalar functions return null if any of their arguments is null². Here is Ask Tom's take on this exact same issue, I personally agree with everything that is said in the answer:
¹ Different implementations of the greatest function existed long before it was standardized. MySQL and Oracle return null if any argument is null; while PostgreSQL and SQL Server ignore null values.
² Conversely, vendor specific scalar functions exist that skip null values. One example is
concat_ws(MySQL, PostgreSQL, SQL Server) function that is designed to simplify concatenation of (possibly null) strings using a separator.