How do I use the trigram tokenizer/similarity option with Peewee and SQLite's FTS5?

1.6k Views Asked by At

This question concerns how to use FTS5's trigram tokenizer with Peewee.

  1. The official FTS5 documentation for SQLite cites support for trigram tokenization/similarity:

     > The experimental trigram tokenizer extends FTS5 to 
     > support substring matching in general, instead of the 
     > usual token matching. When using the trigram tokenizer
     > , a query or phrase token may match any sequence of 
     > characters within a row, not just a complete token.
     > 
     > CREATE VIRTUAL TABLE tri USING fts5(a, tokenize="trigram");
     > INSERT INTO tri VALUES('abcdefghij KLMNOPQRST uvwxyz');
    
  2. I've tried setting up an FTS based class with Peewee. I changed the options to use the trigram tokenizer:

     class Meta:
         db_table = 'fts_test_db'
         database = test_db
         options = {'tokenize': 'trigram', 'content': PrecedentPW}
    
  3. When I attempt to create a table with those options, this error flips up:

     _db.create_tables([_fts], )
    
     >> peewee.OperationalError: no such tokenizer: trigram
    
  4. But if I change the tokenizer options to use something else (e.g. 'porter'), no errors are raised.

How can I use the trigram tokenizer with Peewee?

1

There are 1 best solutions below

2
On BEST ANSWER

You may need to compile the tokenizer yourself or ensure you are running a new enough version. The trigram tokenizer was not included by default until 3.34.0 of Sqlite: https://www.sqlite.org/releaselog/3_34_0.html