Why won't default potgresql text search parser tokenize Georgian words?

48 Views Asked by Andrew Slock At 05 June 2023 at 08:36

I am trying to get postgresql full text search work with the Georgian language. I created a dictionary:

create text search dictionary georgian_hunspell(...);

and a configuration with the default parser:

CREATE TEXT SEARCH CONFIGURATION georgian_hunspell_configuration (parser = default);

But the default parser won't tokenize Georgian words, instead it recognizes them as "Space symbols". Why? Georgian uses letters just like most of the languages. It doesn't have characters like say Chinese, it's just words separated by space. And also Georgian letters are present in the UTF-8 table.

Executing ts_debug shows this:

Some random sequence of English letters is recognized as "Word, all ASCII":

select * from ts_debug('georgian_hunspell_configuration', 'dsfasdfs');

Some random sequence of English and say German/Swedish letters is recognized as "Word, all letters":

select * from ts_debug('georgian_hunspell_configuration', 'äpfasdfpeltäfasdf');

Some random sequence of Armenian letters is also recognized as "Word, all letters":

select * from ts_debug('georgian_hunspell_configuration', 'կդտըպլՔՂԱ');

But some random or meaningful sequence of Georgian letters is recognized as "Space symbols":

select * from ts_debug('georgian_hunspell_configuration', 'ასდადფ');

Why does that happen? Since the default parser won't define the token type correctly, the token in question is not looked up against my dictionaries, therefore the search fails.

This is my postgrsql version: PostgreSQL 14.7 on aarch64-apple-darwin20.6.0, compiled by Apple clang version 12.0.5 (clang-1205.0.22.9), 64-bit

Original Q&A

Why won't default potgresql text search parser tokenize Georgian words?

There are 0 best solutions below

Related Questions in POSTGRESQL

Related Questions in FULL-TEXT-SEARCH

Related Questions in POSTGRESQL-14

Related Questions in GEORGIAN

Trending Questions

Popular # Hahtags

Popular Questions