I am currently working on a people search tool using SOLR to facilitate the indexing + fuzzy search across multiple fields (with edismax), using various filters such as SynonymFilterFactory, WordDelimiterFactory etc and disabling TF-IDF.
This works very well, except for a few cases where a search term is matched multiple times. For example, searching for "Martin XXXX" returns "Marvin Martin" as the highest result because it matches Martin against both "Marvin" and "Martin".
Matching a search term against multiple words in a document, in general, makes a lot of sense. However, in the case of people search, I'd like it to only add the maximum score for each search term (i.e., map each search term to only one word in the document (person's name / information)).
Is there a mechanism in SOLR/Lucene which would allow me to force a one-to-one mapping between search term and matched term?
You can see the issue below in the debug for the query:
0.27641854 = (MATCH) sum of:
0.27641854 = (MATCH) sum of:
0.15077375 = (MATCH) weight(FullName:martin in 118169) [NoTFIDFSimilarityClass], result of:
0.15077375 = score(doc=118169,freq=1.0 = termFreq=1.0
), product of:
0.15077375 = queryWeight, product of:
1.0 = idf(docFreq=1619, maxDocs=328317)
0.15077375 = queryNorm
1.0 = fieldWeight in 118169, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
1.0 = idf(docFreq=1619, maxDocs=328317)
1.0 = fieldNorm(doc=118169)
0.12564479 = (MATCH) weight(FullName:marvin^0.8333333 in 118169) [NoTFIDFSimilarityClass], result of:
0.12564479 = score(doc=118169,freq=1.0 = termFreq=1.0
), product of:
0.12564479 = queryWeight, product of:
0.8333333 = boost
1.0 = idf(docFreq=105, maxDocs=328317)
0.15077375 = queryNorm
1.0 = fieldWeight in 118169, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
1.0 = idf(docFreq=105, maxDocs=328317)
1.0 = fieldNorm(doc=118169)
The query is e.g.,
http://domain/solr/peoplefinder/select?q=Martin~&wt=json&indent=true&defType=edismax&qf=FullName&stopwords=true&lowercaseOperators=true&debug=true