After parsing many documents, I have a lot of rows/columns with Ukrainian text that should be indexed for full-text search in Postgres.
I've found that Postgres 14 supports by default 29 languages, but unfortunately not the Ukrainian one.
After subsequent digging, I've found that it allows adding an external dictionary:
CREATE TEXT SEARCH DICTIONARY my_lang_ispell (
TEMPLATE = ispell,
DictFile = path_to_my_lang_dict_file,
AffFile = path_to_my_lang_affixes_file,
StopWords = path_to_my_lang_astop_words_file
);
But how to find the most relevant DictFile, AffFile, and StopWords files? For example, snowball source doesn't contain this language.
So, could anyone help me find the best way to obtain ispell, aspell, snowball, or another dictionary for the Ukrainian language?
Thanks!
After more deep exploring, found the solution on this resource dict_uk
Or just download & unzip the latest hunspell-uk_UA_X.X.X.zip from here and stop words file
ukrainianlanguage in the Postgres:Now it is available for creating a searchable column with help of
to_tsvector:This example shows the correct stemming for the Ukrainian language:
Results
The Postgres full-text search works as well as similar search text engine SphinxSearch in terms of quality, but it is a bit slower.
On the same query from the huge amount of records (278_000) it returns the same results:
Thank you very much, dict_uk support team!