I am trying to get postgresql full text search work with the Georgian language. I created a dictionary:
create text search dictionary georgian_hunspell(...);
and a configuration with the default parser:
CREATE TEXT SEARCH CONFIGURATION georgian_hunspell_configuration (parser = default);
But the default parser won't tokenize Georgian words, instead it recognizes them as "Space symbols". Why? Georgian uses letters just like most of the languages. It doesn't have characters like say Chinese, it's just words separated by space. And also Georgian letters are present in the UTF-8 table.
Executing ts_debug shows this:
- Some random sequence of English letters is recognized as "Word, all ASCII":
select * from ts_debug('georgian_hunspell_configuration', 'dsfasdfs');
- Some random sequence of English and say German/Swedish letters is recognized as "Word, all letters":
select * from ts_debug('georgian_hunspell_configuration', 'äpfasdfpeltäfasdf');
- Some random sequence of Armenian letters is also recognized as "Word, all letters":
select * from ts_debug('georgian_hunspell_configuration', 'կդտըպլՔՂԱ');
- But some random or meaningful sequence of Georgian letters is recognized as "Space symbols":
select * from ts_debug('georgian_hunspell_configuration', 'ასდადფ');
Why does that happen? Since the default parser won't define the token type correctly, the token in question is not looked up against my dictionaries, therefore the search fails.
This is my postgrsql version: PostgreSQL 14.7 on aarch64-apple-darwin20.6.0, compiled by Apple clang version 12.0.5 (clang-1205.0.22.9), 64-bit