I'm using Polly.Contrib for HttpClient retries.
var delay = Backoff.DecorrelatedJitterBackoffV2(
medianFirstRetryDelay: TimeSpan.FromSeconds(1),
retryCount: 4,
fastFirst: true);
If I want the maximum time waiting to be ~32 seconds median (could be much higher because of randomness in the jitter, which is why I say median). I carefully read the docs which use this for maximum median wait for this API: "between 0 and f * 2^(t+1)".
Here f=1 and t=4, which comes out to max=32. But the WaitAndRetry says that doesn't include the failFast 1st retry in the count? So if I want max wait to be ~32 sec with failFast then is retryCount 4 or 5?
Update
Environment
- I'm in kubernetes using microservices, using HTTP.
- There are some single-instance workloads that manage state. If hit OOM they could be down while restarting.
- 12-30 sec if scheduled on node that doesn't have that container image to download it and restart
- 5 sec restart if container image is ready on the k8s node
Goals by priority
- Retry the HTTP op on restarted workload as quickly as possible but without spamming 1 per second.
- I'd also like to use
firstFast: true, which would be convenient for example SQL deadlock (UPDATE or DELETE on busy table). - If possible use same retry strategy all the time, including for both startup and normal calls. There are some workloads that have multiple instances (k8s
replicas3-8) that have to support stop/pause/start mid-day. Hence jitter, esp if 1st attempt result is a timeout during mid-day start. - Keep it simple and understandable and so less experienced devs and non-C# devs can reason about the retry call timing.
I can tweak parameters to meet all the goals except the last one. There's just too much variation for me to reason about when retry might happen after a failure, and don't think I can explain this behavior to non-C# devs.
Quite frankly I don't understand your requirements in entirety but here are my thoughts.
DecorrelatedJitterBackoffV2This is a specialized sleep duration provider. Depending on the provided parameters it will generate a sequence of sleep durations
IEnumerable<TimeSpan>. So, it is iterable.If you use this provider in your retry policy it will wait between the failed attempt and a new retry attempt as much as the next value from this iterable.
Time constraints
The sleep duration provider controls only the delays between two attempts. In other words it does not have any affect on how long a given attempt takes.
If you have a policy chain like this
Policy.WrapAsync(retryPolicy, timeoutPolicy)then you have constrained the individual attempts (including the original action as well). So, with this in your hand you could calculate the worst case scenario: at most how many retries could be issued, how much time could each attempt take and what are the delays between two attempts.If you would have a policy chain like this
Policy.WrapAsync(timeoutPolicy, retryPolicy)then you have constrained the overall time which could be spent for retries. So, this is an overarching time constraint. With this in your hand all you can say is that in worst case when should the retry give up. But you don't know how many retry attempts could be issued during this period since you don't have an explicit upper bound on each attempt.You can combine the two approaches and create a policy chain like this:
Here you would have a limit for each attempt and for all attempts as a whole as well.
Combining policies
If you have a timeout policy as an inner policy and a retry as an outer policy then you should alter your retry to trigger for timeouts as well. In case of Polly the timeout policy throws a
TimeoutRejectedExceptionnot anOperationCanceledException.So, you should add
.Or<TimeoutRejectedException>()builder method call to your retry policy definition.UPDATE #1
The
medianFirstRetryDelayis used to calculate the next values. You can consider it as a seed for the exponential backoff function.You can't control the max generated delay via the parameters.
But with the following simply wrapper you can:
Then if your print out the results then you should see something like this: