We've been using Azure Search as our production search service for a couple months now, and our clients are starting to voice concerns about flexibility in ranking and scoring - the TF-IDF ratios are meaningless to them (which I can understand) and they are used to seeing percentage matches from their prior vendors.
It's important to understand that our clients mainly query on people's names. These people's names exist in our records both in their own field, as well as in an additional field of unstructured text. When they query on John Anderson, for example, they are looking for records with a certain percentage match to the name John Anderson. They are less concerned with how many times John Anderson appears in the document.
What they want is to be able to customize their results so that, for example, only results with "90% match or higher" to the queried name is returned. We have no idea where to start with this because the only thing we see Azure Search offering us is TF-IDF scoring. How can we convert our understanding of the results to percentage match vs term frequency, which we really aren't concerned with? Can Azure Search handle this? If we've gotten this far along into choosing it as our production search service and we can't present results to our clients in the manner they were accustomed from the vendors they left, they will leave us, and I unfortunately am going to lose my job...
MS Azure Search personnel... please help!
Let me try and propose a few options given the brief description in the comments above:
Assuming you're really focused on people name lookups and have a set of specific rules to model, perhaps you can use different matching rules with different boosts. You'll need to use the full Lucene query syntax for this (use queryType=full in the query string).
For simple cases, you can start by discriminating cases where matches are contiguous versus when they are not: if the input search is "John Anderson", you can rewrite it into:
Which would heavily polarize results and return things in the order you described. If you want to limit how much token-level edit distance until it's not a match you can use slot in phrases:
If you also want to handle cases where an extra term entirely before or entirely after impacts the match, I wonder if you could use prefixes/suffixes. For example, during indexing you add the word "begin" before the name and "end" after (you can use a more random sequence of letter that's unlikely to be a name). So if the field value was "John Anderson", then the field should have "begin John Anderson end". Then at search time you can do:
This will favor matches from beginning to end of the name, followed by infix matches of the name in exact sequence, followed by names with the original tokens plus other stuff in the middle. You can reshuffle the order by adjusting the boosts.
Following this will give you roughly the order you want, but not the percentages. To calculate percentages you could map them from the orders of magnitudes in scores, and/or by post-processing the results against the original search terms.