I have this function in PostgreSQL, but I don't know how to return the result of the query:
CREATE OR REPLACE FUNCTION wordFrequency(maxTokens INTEGER)
RETURNS SETOF RECORD AS
$$
BEGIN
SELECT text, count(*), 100 / maxTokens * count(*)
FROM (
SELECT text
FROM token
WHERE chartype = 'ALPHABETIC'
LIMIT maxTokens
) AS tokens
GROUP BY text
ORDER BY count DESC
END
$$
LANGUAGE plpgsql;
But I don't know how to return the result of the query inside the PostgreSQL function.
I found that the return type should be SETOF RECORD, right? But the return command is not right.
What is the right way to do this?
Use
RETURN QUERY:Call:
Defining the return type explicitly is much more practical than returning a generic
record. This way you don't have to provide a column definition list with every function call.RETURNS TABLEis one way to do that. There are others. Data types ofOUTparameters have to match exactly what is returned by the query.Choose names for
OUTparameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.But note the potential naming conflict between the
OUTparametercntand the column alias of the same name. In this particular case (RETURN QUERY SELECT ...) Postgres uses the column alias over theOUTparameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:ORDER BY 2 DESC. Example:ORDER BY count(*).plpgsql.variable_conflictor use the special command#variable_conflict error | use_variable | use_columnin the function. See:Don't use "text" or "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use
txtandcntin my examples, you may want more explicit names.Added a missing
;and corrected a syntax error in the header.(_max_tokens int), not(int maxTokens)- data type after name.While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Or work with
numericor a floating point type. See below.Alternative
This is what I think your query should actually look like (calculating a relative share per token):
The expression
sum(t.cnt) OVER ()is a window function. You could use a CTE instead of the subquery. Pretty, but a subquery is typically cheaper in simple cases like this one (mostly before Postgres 12).A final explicit
RETURNstatement is not required (but allowed) when working withOUTparameters orRETURNS TABLE(which makes implicit use ofOUTparameters).round()with two parameters only works fornumerictypes.count()in the subquery produces abigintresult and asum()over thisbigintproduces anumericresult, thus we deal with anumericnumber automatically and everything just falls into place.