Spring cloud gateway intermittent delay at proxy request and response handling with reactor netty

280 Views Asked by At

I am using Spring Cloud Gateway, observing intermittent delay while sending request or receiving response from the downstream applications. The notable delay occurs in about 15% of the requests (more than 1 second) with the downstream API responding in 6ms each time. Most of the requests are served without any delay but this delay is happening for each api exposed via the gateway. A lot of the delayed requests see a consistent time gap of a close proximity to 5s (eg., 5.009s, 5.013s, 5.050s) but some are less than 5s in delay as well. At the request level it happens before acquiring the channel and sometimes at the response handling level too(downstream server logs response in 6ms). Sometimes it happens with both the request and response. And at the response level, time delays are often close to 5s or 10s. The load on the server is not much either (the gateway is handling around 150 requests per minute). There is a network load balancer between the downstream service and the api gateway

Here are the logs which I am seeing at the request level.

18:00:15.093  [reactor-http-epoll-4]    Client observation  {name=http.client.requests(null), error=null, context=name='http.client.requests', contextualName='null', error='null', lowCardinalityKeyValues=[http.method='GET', http.status_code='UNKNOWN', spring.cloud.gateway.route.id='app-name', spring.cloud.gateway.route.uri='**uri**'], highCardinalityKeyValues=[http.uri='**uri**'], map=[class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{duration(seconds)=3.8563E-5, duration(nanos)=38563.0, startTimeNanos=1754421052833403}', class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{span=6548dc52fc6e10b02a9532a95c23527d/4c632cc89e1e9109}', class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@7e7c9a4b'], parentObservation={name=spring.security.http.secured.requests(secured request), error=null, context=name='spring.security.http.secured.requests', contextualName='secured request', error='null', lowCardinalityKeyValues=[], highCardinalityKeyValues=[], map=[class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{duration(seconds)=5.021295641, duration(nanos)=5.021295641E9, startTimeNanos=1754416031637319}', class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{span=6548dc52fc6e10b02a9532a95c23527d/81e269c3a499bfaa}', class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@7c376fc9'], parentObservation={name=spring.security.filterchains(security filterchain before), error=null, context=name='spring.security.filterchains', contextualName='security filterchain before', error='null', lowCardinalityKeyValues=[spring.security.filterchain.position='9', highCardinalityKeyValues=[], map=[class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{duration(seconds)=-1.0E-9, duration(nanos)=-1.0, startTimeNanos=1754416030032697}', class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{span=6548dc52fc6e10b02a9532a95c23527d/6dd704053c32d13b}', class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@151cea6d'], parentObservation={name=http.server.requests(null), error=null, context=name='http.server.requests', contextualName='null', error='null', lowCardinalityKeyValues=[exception='none', method='GET', outcome='SUCCESS', status='200', uri='UNKNOWN'], highCardinalityKeyValues=[http.url='**path**'], map=[class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{duration(seconds)=5.023806823, duration(nanos)=5.023806823E9, startTimeNanos=1754416029212405}', class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{span=6548dc52fc6e10b02a9532a95c23527d/2a9532a95c23527d}', class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@db0eebe'], parentObservation=null}}}} created for the request. New headers are [User-Agent:"Apache-HttpClient/4.5.13 (Java/1.8.0_352)", Transfer-Encoding:"chunked", Forwarded:"proto=https;host=host;]   reactor-http-epoll-4    org.springframework.cloud.gateway.filter.headers.observation.ObservedRequestHttpHeadersFilter
18:00:20.100  [reactor-http-epoll-3]    [3efcbab2, L:/gateway:port - R:app-name/**ip**] Channel acquired, now: 2 active connections, 0 inactive connections and 0 pending acquire requests.
18:00:20.101  [reactor-http-epoll-3]    [3efcbab2-2, L:/gateway:port - R:app-server/**ip**] onStateChange(GET{uri=**uri**, connection=PooledConnection{channel=[id: 0x3efcbab2, L:/gateway:port - R:app-server/**ip**]}}, [request_sent])

The channel acquire log came 5s after the NettyRoutingFilter has run all the header filters. After the channle acquire, there are some logs for request_prepared and handler mapped as well, which I have trimmed from the posted logs

Configuration - Linux 2 core processor (With usual cpu or memory utilization)

Spring cloud gateway starter - 4.0.5, Spring webflux - 3.0.6

Spring cloud gateway httpclient:

FIXED pool with max connections - 1000, max idle time - 5s.

As I am finding it difficult to troubleshoot the issue since it is not replicated anywhere else other than where the application is deployed, what could be the issue here or how can I find the root cause?

0

There are 0 best solutions below