How might I make advanced queries in SQLite to search for files by tags?

1.1k Views Asked by At

What Am I Doing?

I'm writing a file tagging program in JavaScript using Electron and I'd like to use SQLite. I can't quite figure out how I'd implement searching by tags, though. I'm new to SQL and SQLite, so I'm not sure this is even possible solely through queries. How might I go about searching as described below?

Searching Details:

I have looked into FTS3/4. I can do everything I want with that EXCEPT the wildcard searches, from the looks of it.

  • Search for files with all given tags: blue_sky AND green_grass
  • Search for files without given tags: NOT blue_sky AND NOT green_grass
  • Search for files with some of the given tags: green_sky OR blue_sky
  • Search for files with wildcards anywhere in a tag: *sky AND *grass AND *bl*e*
  • Combinations of the above: blue_sky AND green* / green_grass AND blue_sky OR green_sky

Tables:

May be changed

CREATE TABLE files (
  id INTEGER PRIMARY KEY,
  name TEXT
);

CREATE TABLE tags (
  id INTEGER PRIMARY KEY,
  name TEXT
);

CREATE TABLE file_tags (
  id INTEGER PRIMARY KEY,
  file_id INTEGER,
  tag_id INTEGER
);

Examples:

INSERT INTO files (name) VALUES ('file_1.png');
INSERT INTO files (name) VALUES ('file_2.png');
INSERT INTO files (name) VALUES ('file_3.png');
INSERT INTO files (name) VALUES ('file_4.png');

INSERT INTO tags (name) VALUES ('blue_sky');
INSERT INTO tags (name) VALUES ('green_sky');
INSERT INTO tags (name) VALUES ('green_grass');
INSERT INTO tags (name) VALUES ('blue_grass');
INSERT INTO tags (name) VALUES ('greenish_blue_sky');


INSERT INTO file_tags (file_id, tag_id) VALUES(file1_id, blue_sky_id);
INSERT INTO file_tags (file_id, tag_id) VALUES(file1_id, green_grass_id);

INSERT INTO file_tags (file_id, tag_id) VALUES(file2_id, blue_sky_id);
INSERT INTO file_tags (file_id, tag_id) VALUES(file2_id, blue_grass_id);

INSERT INTO file_tags (file_id, tag_id) VALUES(file3_id, greenish_blue_sky_id);

INSERT INTO file_tags (file_id, tag_id) VALUES(file4_id, green_sky_id);
INSERT INTO file_tags (file_id, tag_id) VALUES(file4_id, blue_grass_id);

Query: blue_sky and green_grass
result: file_1

Query: blue_sky or green_sky
result: file_1, file_2, file_4

Query: blue_sky and green_grass or blue_grass
result: file_1, file_2

Query: *ish*
result: file_3

Query: *bl*e*
result: file_1, file_2, file_3, file_4

Query: *sky and not blue_grass
result: file_1, file3

Note: If SQLite isn't the right tool for the job, I'm open to suggestions.

1

There are 1 best solutions below

6
On BEST ANSWER

In my opinion, you could get this easier with a modified database structure.
E.g.

  • using 'file_id' and 'tag_id' consistently,
    instead of sometimes 'id' in both cases
  • using foreign keys (admittedly that might require an unavailable feature)

You might be able to use tag_ids as inpit in the first few cases, depending on where the keys come from (an "unexpecting user" would of course type colors). That would also reduce the risk of typos.

So what you could do is:

  • use joins of the 'file_tags' table,
    one for each tag you want to use in your logic
  • join the files table, to access file names for output
  • use subqueries for using tag names instead of tag ids or use more joins instead, I demonstrated both below
  • copy the search logic pretty directly into the 'where'
  • group by file names, in order to get each file only once

Based on your nice MCVE, here are proposals for your example queries:

select fs.name from file_tags t1 
         inner join file_tags t2 on t1.file_id = t2.file_id
         inner join files fs on fs.id = t1.file_id
where t1.tag_id = (select id from tags where name = 'blue_sky')
  and t2.tag_id = (select id from tags where name = 'green_grass')
group by fs.name;

select fs.name from file_tags t1
         inner join file_tags t2 on t1.file_id = t2.file_id
         inner join files fs on fs.id = t1.file_id
where t1.tag_id = (select id from tags where name = 'blue_sky')
   or t2.tag_id = (select id from tags where name = 'green_sky')
group by fs.name;

-- note, here I had to derive from your desired output
-- that you want a '()' around the 'or'
select fs.name from file_tags t1
         inner join file_tags t2 on t1.file_id = t2.file_id
         inner join file_tags t3 on t1.file_id = t3.file_id
         inner join files fs on fs.id = t1.file_id
where t1.tag_id = (select id from tags where name = 'blue_sky')
 and (t2.tag_id = (select id from tags where name = 'green_grass')
   or t3.tag_id = (select id from tags where name = 'blue_grass')
     )
group by fs.name;

select fs.name from file_tags t1 
         inner join files fs on fs.id = t1.file_id
         inner join tags ts on ts.id = t1.tag_id
where ts.name like '%ish%'
group by fs.name;

select fs.name from file_tags t1 
         inner join files fs on fs.id = t1.file_id
         inner join tags ts on ts.id = t1.tag_id
where ts.name like '%bl%e%'
group by fs.name;

select fs.name from file_tags t1 
         inner join files fs on fs.id = t1.file_id
         inner join tags ts on ts.id = t1.tag_id
where ts.name like '%sky' and not ts.name = 'blue_grass'
group by fs.name;

select name from file_tags t1
         inner join files fs on t1.file_id = fs.id
where (select name from tags where id = t1.tag_id) like "%sky"
and not file_id in
        (select file_id from file_tags 
         where tag_id = (select id from tags where name = 'blue_grass')
        );

Output:

name
----------
file_1.png
name
----------
file_1.png
file_2.png
file_4.png
name
----------
file_1.png
file_2.png
name
----------
file_3.png
name
----------
file_1.png
file_2.png
file_3.png
file_4.png
name
----------
file_1.png

If I additionally add:

INSERT INTO tags (name) VALUES ('greenish_blue_sky');
INSERT INTO file_tags (file_id, tag_id) VALUES(file3_id, greenish_blue_sky_id);

Then the last output part is:

name
----------
file_1.png
file_3.png

Using SQLite 3.18.0