I use PostgreSQL 10 and I run CREATE EXTENSION unaccent;
succesfully. I have a plgsql function that contains the following
whereText := 'lower(unaccent(place.name)) LIKE lower(unaccent($1))';
later, according to what user chose, more clauses may be added to the whereText
.
The whereText
is finally used in the query:
placewithkeys := '%'||placename||'%';
RETURN QUERY EXECUTE format('SELECT id, name FROM '||fromText||' WHERE '||whereText)
USING placewithkeys , event, date;
The whereText := 'LOWER(unaccent(place.name)) LIKE LOWER(unaccent($1))';
does not work, even if I remove the LOWER
part.
I do select __my_function('Τζι');
and I get nothing back, even though I should get back results, because in the database there is the name Τζίμα
If I remove the unaccent
and leave the LOWER
it works, but not for accents : τζ
brings Τζίμα
back as it should. It seems like the unaccent
is causing a problem.
What am I missing? How can I fix this?
Since there were comments about the syntax and possible SQLi , I provide the whole function definition, now changed to work accent-insensitive and case-insensitive in Greek:
CREATE FUNCTION __a_search_place
(placename text, eventtype integer, eventdate integer, eventcentury integer, constructiondate integer, constructioncentury integer, arstyle integer, artype integer)
RETURNS TABLE
(place_id bigint, place_name text, place_geom geometry)
AS $$
DECLARE
selectText text;
fromText text;
whereText text;
usingText text;
placewithkeys text;
BEGIN
fromText := '
place
JOIN cep ON place.id = cep.place_id
JOIN event ON cep.event_id = event.id
';
whereText := 'unaccent(place.name) iLIKE unaccent($1)';
placewithkeys := '%'||placename||'%';
IF constructiondate IS NOT NULL OR constructioncentury IS NOT NULL OR arstyle IS NOT NULL OR artype IS NOT NULL THEN
fromText := fromText || '
JOIN construction ON cep.construction_id = construction.id
JOIN construction_atype ON construction.id = construction_atype.construction_id
JOIN construction_astyle ON construction.id = construction_astyle.construction_id
JOIN atype ON atype.id = construction_atype.atype_id
JOIN astyle ON astyle.id = construction_astyle.astyle_id
';
END IF;
IF eventtype IS NOT NULL THEN
whereText := whereText || 'AND event.type = $2 ';
END IF;
IF eventdate IS NOT NULL THEN
whereText := whereText || 'AND event.date = $3 ';
END IF;
IF eventcentury IS NOT NULL THEN
whereText := whereText || 'AND event.century = $4 ';
END IF;
IF constructiondate IS NOT NULL THEN
whereText := whereText || 'AND construction.date = $5 ';
END IF;
IF constructioncentury IS NOT NULL THEN
whereText := whereText || 'AND construction.century = $6 ';
END IF;
IF arstyle IS NOT NULL THEN
whereText := whereText || 'AND astyle.id = $7 ';
END IF;
IF artype IS NOT NULL THEN
whereText := whereText || 'AND atype.id = $8 ';
END IF;
whereText := whereText || '
GROUP BY place.id, place.geom, place.name
';
RETURN QUERY EXECUTE format('SELECT place.id, place.name, place.geom FROM '||fromText||' WHERE '||whereText)
USING placewithkeys, eventtype, eventdate, eventcentury, constructiondate, constructioncentury, arstyle, artype ;
END;
$$
LANGUAGE plpgsql;
Postgres 12
unaccent()
now works for Greek letters, too. Diacritic signs are removed:db<>fiddle here
Quoting the release notes:
Postgres 11 or older
unaccent()
does not yet work for Greek letters. The call:... returns all letters unchanged, no diacritic signs removed as we would expect.
(I extracted this list from the Wikipedia page on Greek diacritics.)
db<>fiddle here
Looks like a shortcoming of the unaccent module. You can extend the default
unaccent
dictionary or create your own. There are instructions in the manual. I created several dictionaries in the past and it's simple. And you are not to first to need this:Postgres unaccent rules for greek characters:
Unaccent rules plus greek characters for Postgres 9.6:
You need write access to the file system of the server, though - the directory containing the unaccent files. So, not possible on most cloud services ...
Or you might report a bug and ask to include Greek diacritic signs.
Aside: Dyamic SQL and SQLi
The code fragments you presented are not vulnerable to SQL injection.
$1
is concatenated as literal string and only resolved in theEXECUTE
command later, where the value is safely passed with theUSING
clause. So, no unsafe concatenation there. I would do it like this, though:Notes:
Less confusing - your original even confused Pavel in the comments, a professional in the field.
Assignments in plpgsql are slightly expensive (more so than in other PL), so adopt a coding style with few assignments.
Concatenate the two
%
symbols forLIKE
into the main query directly, giving the query planner the information that the pattern is not anchored to start or end, which may help a more efficient plan. Only the user input is (safely) passed as variable.Since your
WHERE
clause references tableplace
, TheFROM
clause needs to include this table anyway. So you cannot concatenate the FROM clause independently to begin with. Probably better to keep it all in a singleformat()
.Use dollar-quoting so you don't have to escape single quotes additionally.
Maybe just use
ILIKE
instead oflower(...) LIKE lower(...)
. If you work with trigram indexes (like would seem best for this query): those work withILIKE
as well:I assume you are aware that you may need to escape characters with special meanings in
LIKE
pattern?Audited function
After you provided your complete function ...
This is a complete rewrite with several improvements. Should make the function considerably. Also SQLi-safe (like your original). Should be functionally identical except the cases where I join fewer tables, which might not filter rows that are filtered by joining to the tables alone.
Major features:
Use
EXISTS()
instead of lots of joins in the outer level plusGROUP BY
. This contributes the lion share to the better performance. Related:format()
is typically a good choice to concatenate SQL from user input. But since you encapsulated all code elements and only pass flags, you don't need it in this case. Instead,concat_ws()
is of help. Related:Only concatenate JOINs you actually need.
Fewer assignments, shorter code.
Default values for parameters. Allows simplified call with missing parameters. Like:
Related:
About the short
ROW()
syntax for testing whether any value isNOT NULL
: