I am working on a project which requires to index documents in Azure Search service. This index is later used to search against other documents uploaded by users to find matches / similarities found in the document which is uploaded Vs document which is already indexed. We have a requirement that matching should be done based on Levenshtein algorithm.
Although, Azure search supports "Fuzzy Search" which uses similar approach, however the results/score returned by Azure Search cannot be measured based on Levenshtein distance.
I tried to use Azure Cognitive search "Skill set" to check if i can direct azure to provide Levenshtein distance based scores. However didn't found any way of doing that.
For example, for the following text
source text: "Company have its head quarter in Vienna City",
It provides result with exact match, but the score cannot be interpreted to check Levenshtein distance.
result:
{
"@search.score": 4.399799,
"id": "8eddb05d-8359-4a99-a629-e098d93ae296",
"content": "Deloite have its head quarter in Vienna City."
}
However, i expect score like following
Levenshtein score: 12
Is there any way to get expected scores?
Azure Cognitive Search does not provide a built-in way to return search results with a Levenshtein distance score and you can't use
custom skillsetthey are designed for processing and transforming data during the indexing process, not for performing search queries.However, you can implement a workaround to achieve this requirement by using a custom scoring function.
Once you get the query result, you can use the input text with result to calculate the score. You can use python-Levenshtein to calculate the score.