We are having intermittent, but daily, issues across all our services that use the Azure CosmosDB MongoDB. Even though the write pressure is not high, seemingly random upsert operations suddenly stop working for 1 minute, and throw an MongoDB.Driver.MongoExecutionTimeoutException: Operation exceeded time limit. exception. It happens on some days more than others.

All our services write into seperate collections of a serverless 4.2 MongoDB database. We do not use transactions, we do not use complicated write operations, literally everything is a read by "_id", maybe modifying a value, and an upsert operation of the whole document with ReplaceOneAsync and replacing via _id. This should be a super fast operation.

We use C# and MongoDB.Driver NuGet package version 2.23.1.

Basically all code is using a common code fragment via a custom NuGet:

public async Task UpsertAsync(T item)
{
    await (await _mongoCollection.Value).ReplaceOneAsync(
        i => i.Id == item.Id,
        item,
        new ReplaceOptions { IsUpsert = true },
        CancellationToken.None);
}

public async Task<T?> TryGetByIdAsync(string id)
{
    var res = await (await _mongoCollection.Value).FindAsync(f => f.Id == id);
    return res.FirstOrDefault();
}

// where T is of this base type:
public class MongoCollectionItem : CollectionItem
{
    [BsonId]
    [JsonPropertyName("_id")]
    public string Id { get; init; } = Guid.NewGuid().ToString();
}

Since we use a Serverless instance, we do not think this is throttling. It puzzles us why any operation would take a minute and time out, since we really only act on quite small documents, all in their separate collections.

The CosmosDB instance has retryable writes, and server side retry enabled.

The full exception stack is this:

Error processing job for serial xxxxxx.x.x.xxx: MongoDB.Driver.MongoExecutionTimeoutException: Operation exceeded time limit.
   at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ProcessResponse(ConnectionId connectionId, CommandMessage responseMessage)
   at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ExecuteAsync(IConnection connection, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Servers.Server.ServerChannel.ExecuteProtocolAsync[TResult](IWireProtocol`1 protocol, ICoreSession session, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.RetryableWriteOperationExecutor.ExecuteAsync[TResult](IRetryableWriteOperation`1 operation, RetryableWriteContext context, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.BulkUnmixedWriteOperationBase`1.ExecuteBatchAsync(RetryableWriteContext context, Batch batch, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.BulkUnmixedWriteOperationBase`1.ExecuteBatchesAsync(RetryableWriteContext context, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.BulkMixedWriteOperation.ExecuteBatchAsync(RetryableWriteContext context, Batch batch, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.BulkMixedWriteOperation.ExecuteAsync(IWriteBinding binding, CancellationToken cancellationToken)
   at MongoDB.Driver.OperationExecutor.ExecuteWriteOperationAsync[TResult](IWriteBinding binding, IWriteOperation`1 operation, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.ExecuteWriteOperationAsync[TResult](IClientSessionHandle session, IWriteOperation`1 operation, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.BulkWriteAsync(IClientSessionHandle session, IEnumerable`1 requests, BulkWriteOptions options, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionBase`1.ReplaceOneAsync(FilterDefinition`1 filter, TDocument replacement, ReplaceOptions options, Func`3 bulkWriteAsync)
   at COMPANYNAME.Infrastructure.Persistence.Mongo.MongoCollection`1.UpsertAsync(T item)
...
0

There are 0 best solutions below