Postgres similarity (or text search) matching partial document

313 Views Asked by At

Call me an amateur full-text search dev, here... I've read some tutorials, but now hit a bit of a wall. The following uses Postgres's pg_trgm module:

=> select similarity('Foo', 'Foo Bar');
 similarity 
------------
        0.5

If 'Foo Bar' were a document, it would be a perfect match for the 'Foo' search request. (Yet, it only scores 0.5, let's "live with that" for the moment.) People may argue for using text search instead: select ts_rank(to_tsvector('Foo Bar'), to_tsquery('Foo'));. But text search does not support fuzzy search (or so I've read). So the following would score flat-out zero in text search, but is possible with similarity:

=> select similarity('Foo', 'Foot Bar');
 similarity 
------------
        0.3

0.3 seems to be a fine degradation of the rank/score, based on the "t"ypo in question. However, as the document grows in size, these numbers just don't seem to work anymore:

=> select similarity('Foo', 'Foo Bar Ball Bob Beast Baby Boy'), similarity('Foo', 'Foot Bar');
 similarity | similarity 
------------+------------
 0.16666667 |        0.3

Intuitively, I think the 'Foo Bar Ball...' document is a better match for the 'Foo' search request than the 'Foot Bar' document, but the rank/score does not support this.

So, how does one effectively get the ranking/scoring powers across larger documents, that I think text search provides, while getting the fuzziness that similarity provides?

0

There are 0 best solutions below