I try to incorporate fuzzy serach function in a django project without using Elasticsearch.
1- I am using postgres, so I first tried levenshtein, but it did not work for my purpose.
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s')"
function = "levenshtein"
def __init__(self, expression, search_term, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
**extras
)
items = Product.objects.annotate(lev_dist=Levenshtein(F('sort_name'), searchterm)).filter(
lev_dist__lte=2
)
Search "glyoxl" did not pick up "4-Methylphenylglyoxal hydrate", because levenshtein
considered "Methylphenylglyoxal" as a word and compared with my searchterm "glyoxl".
2. trigram_similar
gave weird results and was slow
items = Product.objects.filter(sort_name__trigram_similar=searchterm)
"phnylglyoxal" did not pick up "4-Methylphenylglyoxal hydrate", but picked up some other similar terms: "4-Hydroxyphenylglyoxal hydrate", "2,4,6-Trimethylphenylglyoxal hydrate"
"glyoxl" did not pick up any of the above terms
3. python package, fuzzywuzzy seems can solve my problem, but I was not able to incorporate it into query function.
ratio= fuzz.partial_ratio('glyoxl', '4-Methylphenylglyoxal hydrate')
# ratio = 83
I tried to use fuzz.partial_ratio
function in annotate
, but it did not work.
items = Product.objects.annotate(ratio=fuzz.partial_ratio(searchterm, 'full_name')).filter(
ratio__gte=75
)
Here is the error message:
QuerySet.annotate() received non-expression(s): 12.
According to this stackoverflow post (1), annotate does not take regular python functions. The post also mentioned that from Django 2.1, one can subclass Func
to generate a custom function. But it seems that Func
can only take database functions such as levenshtein.
Any way to solve these problems? thanks!