INVALID_ARGUMENT: RESOURCE_EXHAUSTED: Connection closed after GOAWAY

1.2k Views Asked by At

I'd like to alter the parameter GRPC_ARG_HTTP2_MAX_PING_STRIKES as documented here (https://github.com/grpc/grpc/blob/master/doc/keepalive.md). with a default value of 2. Want to change it to tolerate 0 pings before the connection is closed with a go-away (without any debug data, e.g too_many_pings).

And why i need to change this?

I have a kotlin microservices architecture with coroutines that uses only unary calls from gRPC. And I'm simulating a productive environment with several threads per second through a load test. After several calls between the communication of two client and server microservices for 1/3 minutes I am returned:

"INVALID_ARGUMENT: RESOURCE_EXHAUSTED: Connection closed after GOAWAY. HTTP/2 error code: ENHANCE_YOUR_CALM (Bandwidth exhausted)."

I've found that if i set that parameter I said above to 0 the error would stop existing. I imagine this must be some gRPC protection to avoid DDOs.

This is our client/server config:

@Bean
fun keepAliveClientConfigurer(): GrpcChannelConfigurer {
    return GrpcChannelConfigurer { channelBuilder, _ ->
        if (channelBuilder is io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder) {
            channelBuilder
                .keepAliveTime(30, SECONDS)
                .keepAliveTimeout(5, SECONDS)
        }
    }
}

@Bean
fun keepAliveServerConfigurer(): GrpcServerConfigurer? {
    return GrpcServerConfigurer { serverBuilder: ServerBuilder<*> ->
        if (serverBuilder is NettyServerBuilder) {
            serverBuilder
                .permitKeepAliveTime(0, TimeUnit.NANOSECONDS)
                .permitKeepAliveWithoutCalls(true)

        }
    }
}

Any pointers on how to change this parameter? Appreciate any responses.

2

There are 2 best solutions below

0
On

This problem does not appear to be PING-related. gRPC server implementations include "too_many_pings" as error information in the GOAWAY, and a grpc-java client would have included that in the error message.

GRPC_ARG_HTTP2_MAX_PING_STRIKES is C-specific. The cross-language configuration options are defined in gRFC A8:

  • PERMIT_KEEPALIVE_TIME, defaulting to 5 minutes
  • PERMIT_KEEPALIVE_WITHOUT_CALLS, defaulting to false

You have found the Java configuration that allows configuring those two PERMITs, and set them to allow essentially any PINGs. If you set those with no improvement to behavior, then that is additional confirmation the problem is not PING-related.

The error message includes "INVALID_ARGUMENT: RESOURCE_EXHAUSTED:", which means the client that received this error is not the client that received the GOAWAY. At least, we have three machines communicating:

A → B → C

A received the INVALID_ARGUMENT error, B received the RESOURCE_EXHAUSTED error, and C sent the GOAWAY. C is the place to double-check the PERMIT configuration, and also the implementation that matters. It is highly unlikely grpc-java sent ENHANCE_YOUR_CALM without debug data, so there's some component that is not grpc-java: either C is in a different language or there is an HTTP/2 proxy being used.

0
On

Me and my team could get this working now!

So, before having this error we were having a problem with the Metadata() of grpc which unfortunately cannot be multithreaded. We needed to send data between requests via the gRPC header and with that we had the following implementation.

        header.put(Key.of(REQUEST_ID_HEADER, ASCII_STRING_MARSHALLER), MDC.get(REQUEST_ID_HEADER))

        val result = adaptedController!!
            .withInterceptors(MetadataUtils.newAttachHeadersInterceptor(header))
            .searchTimeline(request)

When we changed this to use gRPC context objects in which it can be multithreaded then the Metadata error stopped happening and for some reason the RESOURCE_EXHAUSTED error stopped too.

Our main suspicion was that gRPC was keeping that singlethread metadata error in a sort of buffer and after a few minutes with a few requests per second that buffer filled up and the server could no longer receive any other connections.

We were trying to resolve the resource exhausted error first due to the urgency of it. The singlethread metadata error we already had almost a finalized correction and we didn't care much, because it was even another subject. But for some reason the two problems seemed to be intertwined.

Out of curiosity, this is the new implementation that solved the two errors:

        val result = adaptedController!!
            .withInterceptors(UniqueTrackingNumberGrpcFilterClient())
            .searchTimeline(request)


class UniqueTrackingNumberGrpcFilterClient : ClientInterceptor {

private val requestId = MDC.get(REQUEST_ID_HEADER)

override fun <ReqT, RespT> interceptCall(
    method: MethodDescriptor<ReqT, RespT>,
    callOptions: CallOptions,
    next: Channel,
): ClientCall<ReqT, RespT> = next.newCall(
    method,
    callOptions.withCallCredentials(UniqueNumberCallCredentials(requestId))
)
   }

class UniqueNumberCallCredentials(
        private val requestId: String
) : CallCredentials() {

private companion object {
    private val METADATA_REQUEST_KEY = Key.of(REQUEST_ID_HEADER, ASCII_STRING_MARSHALLER)!!
}

override fun applyRequestMetadata(
    requestInfo: RequestInfo,
    appExecutor: Executor,
    metadataApplier: MetadataApplier,
) {
    appExecutor.execute {
        try {
            metadataApplier.apply(Metadata().apply { put(METADATA_REQUEST_KEY, requestId) })
        } catch (e: Throwable) {
            metadataApplier.fail(Status.UNAUTHENTICATED.withCause(e))
        }
    }
}

override fun thisUsesUnstableApi() {}
}