How to create a trigram or ngram word with Postgres

585 Views Asked by At

I am trying to create a trigram word based search with Postgres. The idea is to implement a simplistic did you mean.

I would like to have a table with trigram words instead of strings. I do know that Postgres offers trigram for strings (pg_tgrm) but I would like to accomplish this:

` roses beautiful red colar sun`

trigram word:

[`roses beautiful red`, `beautiful red colar`, `red colar sun`]

How would be the most effective and fast way of achieving such in a query.

Select column from table -- transforming into the above for each row?

I have tried:

with words as (
 select unnest(regexp_split_to_array(`roses beautiful red colar sun`,'\s+')) as c from col
)
select c1.c || c2.c
from words c1
cross join words c2;

But I don't know how I would use cross join for a more advanced scenario.

1

There are 1 best solutions below

0
On BEST ANSWER

You could use the power of PostgreSQL full text search with the following function:

CREATE FUNCTION phrase_trigram(regconfig, text) RETURNS tsquery
   LANGUAGE plpgsql AS
$$DECLARE
   words text[];
   i integer;
   result tsquery;
   q tsquery;
BEGIN
   /* split the string into an array of words */
   words := regexp_split_to_array($2, '[[:space:]]+');

   FOR i IN 1..cardinality(words) - 2 LOOP
      /* a phrase consisting of three consecutive words */
      q := phraseto_tsquery($1, array_to_string(words[i:i+2], ' '));
      IF result IS NULL THEN
         result := q;
      ELSE
         /* append with "or" */
         result := result || q;
      END IF;
   END LOOP;

   RETURN result;
END;$$;

That constructs a full text search query that tests for the "tri-word" phrases you want.

Use it like this:

SELECT to_tsvector('english', 'a text containing beautiful red colar')
       @@ phrase_trigram('english', 'roses beautiful red colar sun'::text);