How does relevance score work in full text search in memsql?

156 Views Asked by At

We are using memsql for full text search. We are dependent on results of memsql Full text search based on the relevance score. I was under opinion that Full Text Search relevance score is based on how accurately the input matches with what is present in the database. I am searching for a string which is exactly same as what is present in database, also there are few entries which are similar to the input string but does not match completely. I was expecting higher relevance score for the record which matches exactly same as input string and lesser relevance score for the partial match. However, I am getting same relevance score for the few records. Please find the example below.

Following are my column values

line_2
Taman Rawang Idaman
Taman Rawang Putra
Taman Rawang Tin
Taman Rawang
Taman Rawang Jaya
Taman Rawang Perdana

Below is my query.

SELECT line_2, MATCH (line_2) AGAINST ('Taman Rawang Jaya') line_2_relevance
FROM GEO_SOURCE
WHERE MATCH (line_2) AGAINST ('Taman Rawang Jaya') >= 0.5
ORDER BY line_2_relevance desc

Output

line_2                 line_2_relevance
Taman Rawang Perdana   1
Taman Rawang Jaya      1
Taman Rawang           1
Taman Rawang Putra     1
Taman Sri Rawang       1
Taman Rawang Tin       1
Taman Rawang Idaman    1

As you can see, even though input matches exactly with database entry, relevance score of partially matched records were also ranked higher. Can any one please explain me how relevance score is calculated ? Does memsql takes care of the order of occurrence of the queried text against what is present in database ?

MemSQL Version 7.1.8

1

There are 1 best solutions below

0
On

Singlestore's implementation is based on C-Lucene, which is a port of the Java Lucene system to C++. Here’s what I was able to find about scoring. Apache Lucene

It's really not all that unusual to see the first result be an order of magnitude higher in score than the second, even though the second is still a meaningful, interesting result. There isn't any guarantee of an even distribution of scores, so we don't know what the 10% figure means. And lucene's scoring algorithm tends to err on the side of making the differences nice and big.

I would suggest to go thru the following link for better understanding.