We are trying to calculate the average response time of API's by enabling prometheus in traefik.
The expression which we are using for the same is like
expr: sum(traefik_service_request_duration_seconds_sum{instance="company.com:12345",service=~"backend-module-test.*"}) / sum(traefik_service_request_duration_seconds_count{instance="company.com:12345",service=~"backend-module-test.*"}) * 1000
This expression is evaluated for a period of 1m, 5m, 10m etc and the resulting graph is displayed on the dashboard.
Issue
- If there is one request that was extremely slow (took more than 10s), the average response time would be diluted and will not be accurate.
Ask
- Basically, we want to get the average response time as accurately as possible.
- Catch, is to calculate it with the constraints given above.
- I am open to any other approach as well which can help us calculate the average response time of api's