Redis/Tile38 client timeout when server is running nominally

100 Views Asked by At

I am having an issue with timeouts when attempting to write a large volume of information to a Tile38(Redis) service hosted in AWS ECS Fargate from another service that is also hosted in ECS.

The client application is written in C# using the StackExchange.Redis library.

Here is the command i'm writing to send the data. Each write is rather large (~500bytes) and happens about 3-6k per second.

_redis.ExecuteAsync("SET", args).ContinueWith(x =>
{
    _logger.LogError(x.Exception, "Server responded with failure to SET");

}, TaskContinuationOptions.OnlyOnFaulted).ConfigureAwait(false);

The client is showing large memory usage but not really any issue with CPU usage. The server is running perfectly happily at low CPU and low memory usage. Both the client and server are in the same ECS cluster in the same AZ.

The client is configured with 2048 CPU and 4096 memory. The server is configured with 8192 CPU and 16384 memory.

The server is showing no errors in the console even with very verbose logging turned on. The client log is full of exceptions like the following.

Timeout awaiting response (outbound=190778KiB, inbound=5211KiB, 79133ms elapsed, timeout is 10000ms), 
command=UNKNOWN, next: SET, inst: 0, qu: 165687, qs: 7913, aw: True, 
bw: Flushing, rs: ReadAsync, ws: Flushing, in: 9036, 
in-pipe: 0, out-pipe: 524832, last-in: 28, cur-in: 0, 
sync-ops: 0, async-ops: 561192, 
serverEndpoint: tile38-leader.geofence.private:9851, conn-sec: 256.11, 
aoc: 0, mc: 1/1/0, mgr: 10 of 10 available, 
clientName: ip-dfgdf(SE.Redis-v2.6.111.64013), 
IOCP: (Busy=0,Free=1000,Min=1,Max=1000), 
WORKER: (Busy=3,Free=32764,Min=2,Max=32767), 
POOL: (Threads=8,QueuedItems=892,CompletedItems=612032), 
v: 2.6.111.64013 
(Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

I've reviewed the doc and from what i can tell its saying i don't have a thread issue but still have a large number of bytes in the read queue. Does this indicate its a network issue?

How can i tell where my bottleneck is? How do i get rid of it?

0

There are 0 best solutions below