My .NET Core 3.1 app uses Polly 7.1.0 retry and bulkhead policies for http resilience. The retry policy uses HandleTransientHttpError()
to catch possible HttpRequestException
.
Now http requests fired with MyClient
sometimes return an HttpRequestException
. Around half of them are caught and retried by Polly. The other half however ends up in my try-catch
-block and I have to retry them manually. This happens before the maximum number of retries is exhausted.
How did I manage to create a race condition preventing Polly from catching all exceptions? And how can I fix this?
I register the policies with the IHttpClientFactory
as follows.
public void ConfigureServices(IServiceCollection services)
{
services.AddHttpClient<MyClient>(c =>
{
c.BaseAddress = new Uri("https://my.base.url.com/");
c.Timeout = TimeSpan.FromHours(5); // Generous timeout to accomodate for retries
})
.AddPolicyHandler(GetHttpResiliencePolicy());
}
private static AsyncPolicyWrap<HttpResponseMessage> GetHttpResiliencePolicy()
{
var delay = Backoff.DecorrelatedJitterBackoffV2(medianFirstRetryDelay: TimeSpan.FromSeconds(1), retryCount: 5);
var retryPolicy = HttpPolicyExtensions
.HandleTransientHttpError() // This should catch HttpRequestException
.OrResult(msg => msg.StatusCode == HttpStatusCode.NotFound)
.WaitAndRetryAsync(
sleepDurations: delay,
onRetry: (response, delay, retryCount, context) => LogRetry(response, retryCount, context));
var throttlePolicy = Policy.BulkheadAsync<HttpResponseMessage>(maxParallelization: 50, maxQueuingActions: int.MaxValue);
return Policy.WrapAsync(retryPolicy, throttlePolicy);
}
The MyClient
that is firing the http requests looks as follows.
public async Task<TOut> PostAsync<TOut>(Uri requestUri, string jsonString)
{
try
{
using (var content = new StringContent(jsonString, Encoding.UTF8, "application/json"))
using (var response = await httpClient.PostAsync(requestUri, content)) // This throws HttpRequestException
{
// Handle response
}
}
catch (HttpRequestException ex)
{
// This should never be hit, but unfortunately is
}
}
Here is some additional information, although I'm not sure that it's relevant.
- Since the
HttpClient
is DI-registered transiently, there are 10 instances of it flying around per unit of work. - Per unit of work, the client fires ~400 http requests.
- The http requests are lenghty (5 min duration, 30 MB response)
Retry and the
HttpRequestException
Whenever we are talking about Polly policies then we can distinguish two different exceptions:
Handled exception
HttpRequestException
).Unhandled exception
WebException
in our case).This can happen if some of your retries run out of attempts. In other words there are some requests which could not succeeded in 6 attempts (5 retry and 1 initial attempt).
This can be easily verified with one of the following two tools:
onRetry
+context
Fallback
+context
onRetry
+context
The
onRetry
is called when the retry policy is triggered but before the sleep duration. The delegate receives theretryCount
. So to be able to connect / relate separate log entries of the same request you need to use some sort of correlation id. The simplest way to have one can be coded like this:Here is a simplified example:
The to be executed method
The policy
The usage
The sample output
Fallback
As it was being said whenever the policy can't succeed then it will re-throw the handled exception. In other words if a policy fails then it escalates the problem to the next level (next outer policy).
Here is a simplified example:
The policy
The usage
The sample output