Random SQLite slow queries during read (SELECT with join from FTS5 table) maybe due to caching issues

66 Views Asked by At

We are having some inconsistent behaviour with slow reads which I think come due to caching misses.

I am giving some background to the server setup:

AWS EC2 instance with 4GB of RAM, so Ubuntu OS. Three Node.js processes in PM2 accessing the SQLite db-s, one that writes, and two that read, through better-sqlite3 version 7.5.3, so using SQLite version 3.38.5. Db-s are running on the default settings, so no WAL since we want maximum read performance. Reads and writes don't happen simultaneously, since the file is first copied to a temp location, written to, and then the new file overwrites the old read-only one.

FTS5 table definition:

CREATE VIRTUAL TABLE IF NOT EXISTS Fts_table USING 
    fts5(_id UNINDEXED, field1, field2, field3, field4, field5, field6, field7, field8, field9, field10, field11, field12, field13, field14, field15, field16, field17, prefix='1 2 3 4 5', content=Content, content_rowid='rowid', 
    tokenize = "unicode61 remove_diacritics 2 tokenchars '~'");

Not to confuse with names, and since they are all to be matched, I've used numbers so that there's an idea of how many fields there are. We use triggers to keep both the FTS5 and content table in sync as specified in the docs.

Case:

There are multiple users who do search based queries on their SQLite file, so each user has their own SQLite file. In there, there is a FTS5 table and a respective external content table with some extra fields. What the user gets is pretty much a match on FTS5, and then a JOIN on the external content table where the full data is. After each search, we explicitly call the close method of the Node wrapper (better-sqlite3 as mentioned) so as to release iNodes of that file when we want to overwrite it after an update. Users have typically in the thousands of entries in the FTS5 table, and the db-s may go in the hundreds of thousands of megabytes.

What is going on is that most of the times, the users do get normal search query times (SELECT), around 50ms or less, but sometimes this spikes up to 2-3 seconds, and it's quite random. It looks like a caching issue, since this spike is typical when the user first searches, and by "first" I mean that it's been some time since he connected to the server and did something, or that the Node processes have restarted through PM2. But this also happens randomly after a user has searched, even seconds ago, and this is the problematic part.

One thing that I sort of noticed consistently, is that if I search for some unrelated token, like "per" and then "con", the second query is still slow, so I am thinking it's due to these pages not being loaded into memory, but again, not consistently, since I don't know when this caching expires, and then randomly pops up later.

I am hitting a wall on this, since as far as I can gather from our logs, this happens as I said even randomly, with no event happening between the previous "normal" user search, and the next quite noticeably slow one. Is there some PRAGMA setting that could be affecting it in this scenario? I tried increasing the mmap_size to 1GB, thinking it would move most of the db to be more easily readable, but I have yet to confirm that it changed anything.

Thanks

0

There are 0 best solutions below