I have two app service endpoints with same weight (1) configured in the azure traffic manager. Some details for these two api apps:
Endpoint A: East US 2, App service plan is S2
Endpoint B: West US, App service plan is S1
Both of their scale out plans are same: min 4, max 7, default 5.
According to the documentation seems the weighted routing method used Round-robin method by default. As these two endpoints have the same weight, I am expecting they received nearly same amount of requests (the ratio is close to 1:1) when I did the load tests. But it is not. The results look fluctuated.
For example, if I started with a 1000 requests ramping up in 10 sec, # of requests that A received : # of requests that B received could be 3 : 1. And if I did a second same test, it could go the opposite way, which is B receiving much more requests than A. I tried to increase the request amount, sometimes I can get a 1:1 result, but this random behavior is not what we want.
How can we ensure that we can distribute the traffic evenly to these two endpoints when we used the weighted routing method in Azure traffic manager?
As mentioned in the Azure Traffic manager weighted traffic-routing method document,
You can also find that it is recommended to flush the DNS client cache while testing the weighted traffic routing method.
Refer: https://learn.microsoft.com/en-us/azure/traffic-manager/traffic-manager-testing-settings#how-to-test-the-weighted-traffic-routing-method
The results of the DNS lookup are cached for the duration of the DNS Time-to-live (TTL). The default TTL for Traffic Manager is 300 seconds.
https://learn.microsoft.com/en-us/azure/traffic-manager/traffic-manager-performance-considerations#performance-considerations-for-traffic-manager-1
The duration of the cache is determined by the 'time-to-live' (TTL) property of each DNS record. Shorter values result in faster cache expiry and thus more round-trips to the Traffic Manager name servers. Longer values mean that it can take longer to direct traffic away from a failed endpoint. Traffic Manager allows you to configure the TTL used in Traffic Manager DNS responses to be as low as 0 seconds and as high as 2,147,483,647 seconds, enabling you to choose the value that best balances the needs of your application.
A TTL of 0 means that downstream DNS resolvers don’t cache query responses and all queries are expected to reach the Traffic Manager DNS servers for resolution.
Reference:
https://learn.microsoft.com/en-us/azure/traffic-manager/traffic-manager-how-it-works#traffic-manager-and-the-dns-cache
https://learn.microsoft.com/en-us/azure/traffic-manager/traffic-manager-faqs#how-high-or-low-can-i-set-the-ttl-for-traffic-manager-responses
My recommendation here is to reduce the TTL value of your Traffic manager profile to 5 seconds and test this again.