Per the recommendation of a mongodb atlas consultant, I am attempting to switch over from regex to atlas search for our application's live search feature. We have the following old and new routes for this:
old live-search approach using regex
router.get('/live-search/text/:text/regex', function (req, res) { // regex search
try {
let text = req.params.text; // 's'
let queryFilters = { label: { $regex: `${text}`, $options: 'i' } };
// And Return Top 20
db.gs__ptgc_selects
.find(queryFilters)
.limit(20)
.then(data => res.json(data))
.catch(err => res.status(400).json('Error: ' + err));
} catch (error) {
console.log('error: ', error);
res.status(500).json({ statusCode: 500, message: error.message });
}
});
new live-search with mongodb atlas's atlas search
outer.get('/live-search/text/:text/atlas', function (req, res) { // atlas search
try {
let text = req.params.text;
let queryFilters = [
{
$search: {
index: 'default_search', // optional, defaults to "default"
autocomplete: { query: `${text}`, path: 'label' } // "tokenOrder": "any|sequential", "fuzzy": <options>, "score": <options>
}
},
{ $limit: 20 }
];
// And Return Top 20
db.gs__ptgc_selects
.aggregate(queryFilters)
.then(data => res.json(data))
.catch(err => res.status(400).json('Error: ' + err));
} catch (error) {
console.log('error: ', error);
res.status(500).json({ statusCode: 500, message: error.message });
}
});
for the new approach, we created a default_search search index in the mongodb atlas UI, and here is the resulting mappings for that default_search index:
{
"mappings": {
"dynamic": false,
"fields": {
"label": {
"maxGrams": 5,
"minGrams": 3,
"tokenization": "nGram",
"type": "autocomplete"
}
}
},
"storedSource": {
"include": [
"label"
]
}
}
Simply put, the quality of the search results using mongodb atlas are not as good as the results using atlas search with this index mappings. For reference, we are searching over the label column in a collection with 200,000 labels of basketball players, teams, and games that looks like this:
search_over = [
{ _id: 'jadkfl', label: 'M: Stanford Cardinal', type: 'team' },
{ _id: 'afdacc', label: 'W: Stanford Cardinal', type: 'team' },
{ _id: 'adsjkf', label: 'Cameron Brink: Stanford', type: 'player' },
{ _id: 'aidjaf', label: 'M: 2023-02-03: Stanford vs Montana', type: 'game' },
{ _id: 'uiuass', label: 'Tam Stanford: Hood', type: 'player' },
...
]
Here is an example of search results for stanfo with regex
Here is an example of search results for stanfo with atlas search
As I review this entire post and compare these search results, the 2 biggest concerns I have with the new atlas search results are actually somewhat minor:
I prefer the matching teams
M: Stanford Cardinal,W: Stanford Cardinalto be the top 2 results, which they are in regex but not for atlas search.If I search for
Stanford Ca, atlas search returns an empty string, presumably because in themappingsthere is a minGram of 3, andCahas two letters only in the second work. Still seems strange that all ofStanford Camatches nothing.
Can I improve the /atlas route to sort results by the type field returning team first, and also how can I ensure that Stanford Ca doesn't return an empty array? It is safe to lower minGram from 3 to 1?

