Multi-valued fields similarity scoring in Lucene, taking AVG scores or MAX scores among fields

245 Views Asked by At

Is there any way to modify the Lucene default similarity scoring function to support Multi-valued fields search, i.e. for a document that has three "persons" field, there will be three different similarity scores for each name.

An example will be, indexing a paper as one document, where its authors has multiple alias,

Person 1: David Bowie, David Robert Jones, Ziggy Stardust, Thin White Duke

Person 2: David Letterman

Person 3: David Hasselhoff, David Michael Hasselhoff

When we are searching "David", can we return 3 different similarity scores, where Score(Person 2) > Score(Person 3) > Score(Person 1).

Furthermore, can we implement an Indri style MAX or AVG operator, where MAX(document)=Score(Person 2) and AVG(document)=AVG{Score(Person 2), Score(Person 3), Score(Person 1)}

Any pointers to which part of Lucene implementation can be modified will be appreciated. Thanks.

0

There are 0 best solutions below