MongoDB Atlas Search seems to ignore synonyms when using some analyzers

270 Views Asked by At

Say I just have one document in a collection

{
    _id: <whatever>,
    sound: 'Dong'
}

and a synonyms collection with only one mapping

{
    mappingType: 'explicit',
    input: ['Ding'],
    synonyms: ['Ding', 'Dong']
}

and I want to create a search index which uses those to return the one document when one queries for 'Ding' on the property sound.

In this minimal example I can just use the lucene.standard analyzer and all works perfectly (lucene.english works as well). But changing just the analyzer definitions to lucene.keyword (and custom analyszers, but there I might be making another mistake) breaks things, i.e. no document is returned. The definitions are pretty straight-forward; search index field definition

  "sound": {
    "analyzer": "lucene.keyword",
    "searchAnalyzer": "lucene.keyword",
    "type": "string"
  },

and synonyms

  "synonyms": [
    {
      "analyzer": "lucene.keyword",
      "name": "synonym_mapping",
      "source": {
        "collection": "synonyms"
      }
    }
  ]

Using MongoDB Compass to explain the query, I can see that for lucene.standard and lucene.english the explain looks slightly different (type: "DefaultQuery" and "queryType": "SafeTermAutomatonQueryWrapper" sounds like a wrapper for synonyms is used, maybe?) than for the not-working analyzers (type: "TermQuery"), but there is no documentation on what everything means.

At this point, my best guess is that either some analyzers are not supposed to work with synonyms (I couldn't find anything in the docs though, no error or warning either obviously), or the implementation to handle that case is missing.

Am I doing something wrong?

1

There are 1 best solutions below

0
oli On

I think I somewhat understand the behavior now. The following starts with the use-case of the question with the lucene.keyword analyzer. What I think happens is the following:

  1. Query for sound: 'Ding'
  2. 'Ding' is converted to lowercase; this is the extra important step and contrary to lucene.keyword behavior, and synonyms are looked up for 'ding'
  3. 'ding' synonyms was not found, search returns no results

So if I change my synonyms to

{
    mappingType: 'explicit',
    input: ['ding'],
    synonyms: ['Ding', 'Dong']
}

I can find documents with 'Ding' or 'Dong', but here the case matters again, because that is lucene.keyword behavior.

I guess it somewhat makes sense, because I read that lucene (always?) parses queries to lowercase, but since this conflict with the behavior of lucene.keyword this is pretty confusing, to me anyway. lucene.standard and similar is not affected, because they ignore case anyway when they look something up.

What I will use in the end is a custom analyzer which behaves like a case-insenstive lucene.keyword, since I don't care about the case but want to match multi-word-queries otherwise, and use lowercase synonyms.