Say I just have one document in a collection
{
_id: <whatever>,
sound: 'Dong'
}
and a synonyms collection with only one mapping
{
mappingType: 'explicit',
input: ['Ding'],
synonyms: ['Ding', 'Dong']
}
and I want to create a search index which uses those to return the one document when one queries for 'Ding' on the property sound.
In this minimal example I can just use the lucene.standard analyzer and all works perfectly (lucene.english works as well). But changing just the analyzer definitions to lucene.keyword (and custom analyszers, but there I might be making another mistake) breaks things, i.e. no document is returned. The definitions are pretty straight-forward; search index field definition
"sound": {
"analyzer": "lucene.keyword",
"searchAnalyzer": "lucene.keyword",
"type": "string"
},
and synonyms
"synonyms": [
{
"analyzer": "lucene.keyword",
"name": "synonym_mapping",
"source": {
"collection": "synonyms"
}
}
]
Using MongoDB Compass to explain the query, I can see that for lucene.standard and lucene.english the explain looks slightly different (type: "DefaultQuery" and "queryType": "SafeTermAutomatonQueryWrapper" sounds like a wrapper for synonyms is used, maybe?) than for the not-working analyzers (type: "TermQuery"), but there is no documentation on what everything means.
At this point, my best guess is that either some analyzers are not supposed to work with synonyms (I couldn't find anything in the docs though, no error or warning either obviously), or the implementation to handle that case is missing.
Am I doing something wrong?
I think I somewhat understand the behavior now. The following starts with the use-case of the question with the
lucene.keywordanalyzer. What I think happens is the following:sound: 'Ding''Ding'is converted to lowercase; this is the extra important step and contrary tolucene.keywordbehavior, and synonyms are looked up for'ding''ding'synonyms was not found, search returns no resultsSo if I change my synonyms to
I can find documents with
'Ding'or'Dong', but here the case matters again, because that islucene.keywordbehavior.I guess it somewhat makes sense, because I read that lucene (always?) parses queries to lowercase, but since this conflict with the behavior of
lucene.keywordthis is pretty confusing, to me anyway.lucene.standardand similar is not affected, because they ignore case anyway when they look something up.What I will use in the end is a custom analyzer which behaves like a case-insenstive
lucene.keyword, since I don't care about the case but want to match multi-word-queries otherwise, and use lowercase synonyms.