We are using AwsCrypto aws java sdk for encryption and decryption. We follow the pattern mentioned in this aws doc for using the same with data-key cache enabled.

For a few requests I am seeing intermittently TLS handshake with kms server is taking time(50 sec max before retrying to establish connection as per logs) but then there are other similar requests where TLS handshake is happening withing ms.

As per logs the socket connection timeout is set to 2000 ms but for some reason the connection timeout is not occuring and thread is stuck on waiting for handshake response for more than 30 sec and ranging upto 50 seconds.

This is more problem as thread is blocked for no-reason and as our service scale it can be a bottleneck and we want to fix these latency spikes due to kms.

Related logs

*.*.awssdk.http.apache.internal.conn.SdkTlsSocketFactory: Connecting socket to kms.us-east-1.amazonaws.com/52.119.199.83:443 with timeout 2000

2023-12-09T14:26:04.993Z *.*.awssdk.http.apache.*.conn.SdkTlsSocketFactory: Starting handshake

2023-12-09T14:26:35.029Z *.*.awssdk.request: Retryable error detected. Will retry in 43ms. Request attempt number 2

As can be seen the after handshake was initiated connection didn't break for 30 sec before retrying. But the timeout for connecting to socket was 2sec as can be seen from 1st log

Is there some mis-configuration that's causing this or some other issue?

Our service is a ECS based service usign aws sdk 1.x

PS: For those voting to close kindly put a comment as to why this question should be closed. I would be happy to do that myself given that there is acceptable reason.

0

There are 0 best solutions below