Handling service bus Message.Complete() exceptions

4.9k Views Asked by At

Consider the scenario, an Azure service bus with message deduplication enabled, with a single topic, with a single subscription and an application that is subscribed to that queue.

How can I ensure that the application receives messages from the queue once and only once ?

Here is the code I'm using in my application to receive messages :

public abstract class ServiceBusListener<T> : IServiceBusListener
{
    private SubscriptionClient subscriptionClient;
    // ..... snip

    private void ReceiveMessages()
    {
        message = this.subscriptionClient.Receive(TimeSpan.FromSeconds(5));

        if (message != null)
        {
            T payload = message.GetBody<T>(message);                                    

            try
            {
                DoWork(payload);

                message.Complete();
            }
            catch (Exception exception)
            {
                // message.Complete failed
            }
        }
    }
}

The problem I forsee is that if message.Complete() fails for whatever reason, then that message that has just been processed will remain on the subscription's queue in Azure. When ReceiveMessages() is called again it will pick up that same message from the queue and the application would do the same work again.

Whilst the best solution would be to have idempotent domain logic (DoWork(payload)), this would be very difficult to write in this instance.

The only method I can see to ensure once and only once delivery to an application is by building another queue to act as an intermediary between the Azure service bus and the application. I believe this is called a 'Durable client-side queue'.

However I can see that this would be a potential issue for a lot of applications that use Azure service bus, so is a durable client-side queue the only solution ?

3

There are 3 best solutions below

1
On BEST ANSWER

I have similar challenges in a large scale Azure platform I am responsible for. I use a logical combination of the concepts embodied by the Compensating Transaction pattern (https://msdn.microsoft.com/en-us/library/dn589804.aspx), and Event sourcing Pattern (https://msdn.microsoft.com/en-us/library/dn589792.aspx). Exactly how you incorporate these concepts will vary, but ultimately, you may need to plan on your own "rollback" logic, or detecting that a previous process completed 100% successfully minus the removal of the message. If there is something you could check upfront, you will know that a message was simply not removed, then complete it and move on. How expensive that "check" is may make this a bad idea. You can even "create" an artificial final step, like adding a row to a DB, that runs only when the DoWork reaches the end. You can then check for that row before processing any other messages.

IMO, the best approach is to make sure that all of the steps in your DoWork() check for the existence of the work as having already been performed (if possible). For example, if it's creating a DB table, run a "IF NOT EXISTS(SELECT TABLE_NAME FROM INFORMATION_SCHEMA...". In that scenario, even in the unlikely event this happens, it's safe to process the message again.

Other approaches I use are to store the MessageID's (the sequential bigint on each message) of the previous X messages (i.e. 10,000), and then check for their existence (NOT IN) before I proceed with processing a message. Not as expensive as you might think and very safe. If found, simply Complete() the message and move on. In other situations, I update the message with a "starting" type status (inline in certain queue types, persisted elsewhere in others), then proceed. If you read a message and this is already set to "started", you know something either failed or did not clear appropriately.

Sorry this is not a clear cut answer, but there are a lot of considerations.

Kindest regards...

1
On

You can continue to use a single subscription if you include the logic to detect if the message has been successfully processed already or the stage it had reached into your message handling.

For example, I use service bus messages to insert payments from an external payment system into a CRM system. The message handling logic first checks to see if the payment already exists in CRM (using unique ids associated with the payment) before inserting. This was required because very occasionally the payment would be successfully added to CRM but not reported back as such (timeout or connectivity). Using Receive/Delete when picking up a message would mean that payments would potentially be lost, not checking that the payment already existed could result in duplicate payments.

If this is not possible then another solution I have applied is updating table storage to record the progress of handling a message. When picking up a message the table is checked to see if any stages have already been completed. This allows a message to continue from the stage it had reached previously.

The most likely cause of the scenario you outline is that the time taken to DoWork exceeds the lock on the message. The message lock timeout can be adjusted to a value that safely exceeds the expected DoWork period. It also possible to call RenewLock on a message within the handler if you are able to track time taken to process against the message lock expiry.

Maybe I misunderstand the design principle of a second queue but it seems as if this would be just as vulnerable to the original scenario you outlined.

Hard to give a definitive answer without knowing what your DoWork() involves but I would consider one or combination of the above as a better solution.

4
On

The default behavior when you dequeue a message is called "Peek-Lock" it will lock the message so no one else can get it while your processing it and will remove it when you commit. It will unlock if you fail to commit, so it could be picked up again. This is probably what you are experiencing. You can change the behavior to use "Receive and Delete" which will delete it from the queue as soon as you receive it for processing. https://msdn.microsoft.com/en-us/library/azure/hh780770.aspx

https://azure.microsoft.com/en-us/documentation/articles/service-bus-dotnet-how-to-use-topics-subscriptions/#how-to-receive-messages-from-a-subscription