Switching from regex to mongodb atlas-search for live search, how to improve search results

132 Views Asked by At

Per the recommendation of a mongodb atlas consultant, I am attempting to switch over from regex to atlas search for our application's live search feature. We have the following old and new routes for this:

old live-search approach using regex

router.get('/live-search/text/:text/regex', function (req, res) { // regex search
    try {
        let text = req.params.text; // 's'
        let queryFilters = { label: { $regex: `${text}`, $options: 'i' } };

        // And Return Top 20
        db.gs__ptgc_selects
            .find(queryFilters)
            .limit(20)
            .then(data => res.json(data))
            .catch(err => res.status(400).json('Error: ' + err));
    } catch (error) {
        console.log('error: ', error);
        res.status(500).json({ statusCode: 500, message: error.message });
    }
});

new live-search with mongodb atlas's atlas search

outer.get('/live-search/text/:text/atlas', function (req, res) { // atlas search
    try {
        let text = req.params.text;
        let queryFilters = [
            {
                $search: {
                    index: 'default_search', // optional, defaults to "default"
                    autocomplete: { query: `${text}`, path: 'label' } // "tokenOrder": "any|sequential", "fuzzy": <options>, "score": <options>
                }
            },
            { $limit: 20 }
        ];

        // And Return Top 20
        db.gs__ptgc_selects
            .aggregate(queryFilters)
            .then(data => res.json(data))
            .catch(err => res.status(400).json('Error: ' + err));
    } catch (error) {
        console.log('error: ', error);
        res.status(500).json({ statusCode: 500, message: error.message });
    }
});

for the new approach, we created a default_search search index in the mongodb atlas UI, and here is the resulting mappings for that default_search index:

{
    "mappings": {
        "dynamic": false,
        "fields": {
            "label": {
                "maxGrams": 5,
                "minGrams": 3,
                "tokenization": "nGram",
                "type": "autocomplete"
            }
        }
    },
    "storedSource": {
        "include": [
            "label"
        ]
    }
}

Simply put, the quality of the search results using mongodb atlas are not as good as the results using atlas search with this index mappings. For reference, we are searching over the label column in a collection with 200,000 labels of basketball players, teams, and games that looks like this:

search_over = [
  { _id: 'jadkfl', label: 'M: Stanford Cardinal', type: 'team' },
  { _id: 'afdacc', label: 'W: Stanford Cardinal', type: 'team' },
  { _id: 'adsjkf', label: 'Cameron Brink: Stanford', type: 'player' },
  { _id: 'aidjaf', label: 'M: 2023-02-03: Stanford vs Montana', type: 'game' },
  { _id: 'uiuass', label: 'Tam Stanford: Hood', type: 'player' },
  ...
]

Here is an example of search results for stanfo with regex

enter image description here

Here is an example of search results for stanfo with atlas search

enter image description here

As I review this entire post and compare these search results, the 2 biggest concerns I have with the new atlas search results are actually somewhat minor:

  1. I prefer the matching teams M: Stanford Cardinal, W: Stanford Cardinal to be the top 2 results, which they are in regex but not for atlas search.

  2. If I search for Stanford Ca, atlas search returns an empty string, presumably because in the mappings there is a minGram of 3, and Ca has two letters only in the second work. Still seems strange that all of Stanford Ca matches nothing.

Can I improve the /atlas route to sort results by the type field returning team first, and also how can I ensure that Stanford Ca doesn't return an empty array? It is safe to lower minGram from 3 to 1?

0

There are 0 best solutions below