Efficiently Handling Deletions in Nested Cosmos DB Data Structures with Azure Cognitive Search

96 Views Asked by At

How can I configure Azure Cognitive Search to detect and process deletions based on nested properties within a Cosmosdb document?

Description of the problem:

I'm currently integrating Azure Cognitive Search with a Cosmos DB collection that contains complex, nested documents like this:

{
   "name":"test name",
   "info":{
   "address":"test address",
   "isIndexing":"false"}
}

I want to synchronize the search index so that deletions based on the document's "isIndexing" property are reflected.  What I want to do is use the Data Deletion Detection Policy of Azure Cognitive Search, and for this reason, I created it like this:

"dataDeletionDetectionPolicy" : {
    "@odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
    "softDeleteColumnName": "info.isIndexing",
    "softDeleteMarkerValue": "true"
}

so that the document should be deleted from the index if "isIndexing" property is true but this works properly if the "isIndexing" property is in the top level of the object, like here

{
   "name":"test name",
   "isIndexing":"false"
   "info":{
      "address":"test address"
   }
}

and the policy will be like that

"dataDeletionDetectionPolicy": {
    "@odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
    "softDeleteColumnName": "isIndexing",
    "softDeleteMarkerValue": "true"
}

But I don't want to change the structure of the document in Cosmosdb, and I don't know if the deletion detection policy does not support dot notation for nested properties. This might be the reason the policy isn't working as expected, or Azure Cognitive Search's soft delete feature does not inherently support nested properties directly.

1

There are 1 best solutions below

0
Suresh Chikkam On

But I don't want to change the structure of the document in Cosmosdb, and I don't know if the deletion detection policy does not support dot notation for nested properties. This might be the reason the policy isn't working as expected

The soft delete functionality is designed to work with a single column for marking deletions.

How can I configure Azure Cognitive Search to detect and process deletions based on nested properties within a Cosmosdb document?

Instead of relying on the soft delete feature, implement custom logic in your Azure Function to detect deletions based on the nested isIndexing property within the Cosmos DB documents.

  • Use the Azure Cosmos DB SDK to query documents and identify those that need to be deleted from the search index based on the isIndexing property.
using Microsoft.Azure.Documents;
using Microsoft.Azure.WebJobs;
using Microsoft.Extensions.Logging;
using Microsoft.Azure.Search;
using Microsoft.Azure.Search.Models;

public static class CosmosDBTriggerFunction
{
    [FunctionName("CosmosDBTriggerFunction")]
    public static async Task Run(
        [CosmosDBTrigger(
            databaseName: "YourDatabaseName",
            collectionName: "YourCollectionName",
            ConnectionStringSetting = "CosmosDBConnectionString",
            CreateLeaseCollectionIfNotExists = true)] IReadOnlyList<Document> documents,
        ILogger log)
    {
        if (documents != null && documents.Count > 0)
        {
            var searchServiceName = "YourSearchServiceName";
            var indexName = "YourIndexName";
            var searchApiKey = "YourSearchServiceApiKey";

            SearchIndexClient searchClient = new SearchIndexClient(searchServiceName, indexName, new SearchCredentials(searchApiKey));
            ISearchIndexClient indexClient = searchClient.Indexes.GetClient(indexName);

            foreach (var doc in documents)
            {
                // Check the value of the isIndexing property
                bool isIndexing = doc.GetPropertyValue<bool>("info.isIndexing");

                if (isIndexing)
                {
                    var documentId = doc.GetPropertyValue<string>("id");

                    try
                    {
                        await indexClient.Documents.DeleteAsync(documentId);
                        log.LogInformation($"Document with ID {documentId} marked as deleted in Azure Cognitive Search.");
                    }
                    catch (Exception ex)
                    {
                        log.LogError($"Error deleting document with ID {documentId} from Azure Cognitive Search: {ex.Message}");
                    }
                }
            }
        }
    }
}
  • When a document is identified for deletion, use the Azure Cognitive Search SDK or REST API to update the search index accordingly. Then delete the corresponding document from the search index based on its unique identifier.

enter image description here

We will see informational logs when a document is marked as deleted in Azure Cognitive Search.