Searching for strings that include "-" in Azure Cognitive Search

43 Views Asked by At

I am facing an issue with my Azure Cognitive Search.

I have an index including a couple hundred documents. Most of them are titled as a short sentence, and each one of them ends with the suffix ".docx". To begin with, that gave me issues, because it made it difficult getting a high score when searching for a word if it was the last word before the ".docx". I fixed this by adding an "*" to the end of each search, and that worked really well.

However, I am now facing a new issue. When searching for words including a "-" I have to remove the "*" from the end, and I am not quite sure how that will effect the results if the word with the hyphen is directly in front of the ".docx" for example.

I would like to keep the functionality that comes with the added "*", while also being able to search for terms that include "-". I suspect this will be an issue with other operators as well, so if I could have a query that just search for the search term as a string, that would be really nice.

I have tried to escape the operator like this "search-term", seems to work, but when adding an "" to the end, it doesn't give any results. Like I said earlier, this could be a solution, but I am not sure if it would score "this is my search-term.docx" highly, as it does not have the "" at the end.

Another thing I tried was a regex approach, like this "/search-term./", but that also seems to score titles like "...search.docx" lower when searching for "/search./"

I suppose this could be solved without my code, as I get the same results in the Azure Cognitive Search UI, but I'll add it here for good measure.

public ActionResult<DcFolder.DcFolderDto> GetSearchResults(string query) {

query = query + "*";

Uri searchEndpointUri = new(SEARCH_ENDPOINT);

SearchClient client = new(
searchEndpointUri,
SEARCH_INDEX_NAME,
new AzureKeyCredential(SEARCH_KEY));

var options = new SearchOptions {
    SearchFields = { "metadata_storage_name", "content" },
    QueryType = Azure.Search.Documents.Models.SearchQueryType.Full,
    SearchMode = Azure.Search.Documents.Models.SearchMode.Any,
};

var search = client.Search<DcFolder.DcFolderDto>(query, options, default);
var document = search.GetRawResponse().Content.ToString();

dynamic jsonObject = JsonConvert.DeserializeObject(document);

return Ok(jsonObject);
1

There are 1 best solutions below

0
Gia Mondragon - MSFT On

It looks like you're using the standard Lucene analyzer which takes the hyphens (just as other stop words) to divide the tokens. Take a look at how analyzers work.

I suggest you use another analyzer (such as keyword) instead or custom as needed.