Infinispan HOTROD client throwing intermittent java.net.SocketTimeoutException: GetOperation

1.4k Views Asked by At

I'm using Infinispan HOTROD into java application running on IBM Liberty app server with JDK8.

HOTROD client (lib 12.1.11.Final-redhat-00001 version) is implemented via `

org.infinispan.jcache.remote.JCachingProvider :

@PostConstruct
private void setUp() {
      LOGGER.info("START [setUp] CACHE");
     
      File conf = new File(System.getenv("CLIENT_HOTROD_FILE_PATH"));
     
      URI uri = conf.toURI();
    
      // Retrieve the system wide cache manager via org.infinispan.jcache.remote.JCachingProvider
      javax.cache.CacheManager cacheManager = Caching.getCachingProvider("org.infinispan.jcache.remote.JCachingProvider")
            .getCacheManager(uri, this.getClass().getClassLoader(), null);
      
      this.cache = cacheManager.getCache(DATAGRIDKEY);
     
      LOGGER.info("END [setUp] cache " + this.cache.getName() );
  }

HOT-ROD client config file:

infinispan.client.hotrod.server_list=server1.x.xx.xxx:11222;server2.x.xx.xxx:11222;server3..x.xx.xxx:11222;server4..x.xx.xxx:11222
infinispan.client.hotrod.auth_username=user-app
infinispan.client.hotrod.auth_password=password
infinispan.client.hotrod.auth_realm=default
infinispan.client.hotrod.sasl_mechanism=SCRAM-SHA-512
infinispan.client.hotrod.client_intelligence=HASH_DISTRIBUTION_AWARE

HOT-ROD client config properties available

Otherwise the config server REDHAT Data Grid (8.2.3 version) is the followed:

{
  "distributed-cache": {
    "mode": "ASYNC",
    "remote-timeout": 17500,
    "state-transfer": {
      "timeout": 60000
    },
    "encoding": {
      "key": {
        "media-type": "text/plain"
      },
      "value": {
        "media-type": "application/x-protostream"
      }
    },
    "locking": {
      "concurrency-level": 1000,
      "acquire-timeout": 15000,
      "striping": false
    },
    "statistics": true
  }
}  

Sometimes when the application has some requests into the log application side, occurs the java.net.SocketTimeoutException: GetOperation issue as the followed stacktrace:

[2/15/22 17:24:39:445 CET] 00000573 HOTROD W org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation handleException ISPN004098: Closing connection [id: 0x9a1a8fb2, L:/10.0.20.160:55295 ! R:10.0.18.97/10.0.18.97:11222] due to transport error java.net.SocketTimeoutException: GetOperation{Cache-Name-Test, key=[B0x4A6C636F6D2E6475636174692E77612E..[110], flags=1, connection=10.0.18.97/10.0.18.97:11222} timed out after 60000 ms
at org.infinispan.client.hotrod.impl.operations.HotRodOperation.run(HotRodOperation.java:185)
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:822) 

The issue is more frequent when the app receives a massive GET KEY requests, but sometimes the issue is occurred during an unmassive load requests.

Could you know how to resolve the issue ? Do you know any suggestion ?

Thanks

2

There are 2 best solutions below

1
On

I tried to edit the client config HOT-ROD to understand the behavior of the client, below I want to summarize the new configuration changes :

infinispan.client.hotrod.server_list=server1:11222;server2:11222;server3:11222;server4:11222
infinispan.client.hotrod.auth_username=user-app
infinispan.client.hotrod.auth_password=password
infinispan.client.hotrod.auth_realm=default
infinispan.client.hotrod.sasl_mechanism=SCRAM-SHA-512

infinispan.client.hotrod.socket_timeout=15000
infinispan.client.hotrod.connect_timeout=60000
infinispan.client.hotrod.max_retries=5

# Connection pool
infinispan.client.hotrod.connection_pool.max_active=50
infinispan.client.hotrod.connection_pool.exhausted_action=WAIT
infinispan.client.hotrod.connection_pool.max_wait=20000
infinispan.client.hotrod.connection_pool.min_idle=10
#default is 1800000
infinispan.client.hotrod.connection_pool.min_evictable_idle_time=600000
infinispan.client.hotrod.connection_pool.max_pending_requests=846

I noticed that changing infinispan.client.hotrod.socket_timeout=15000 millisecond. Whatching app log file I seen the same error, but currently, instead 60000 ms, is to 15000 ms the socketTimeoutException. With the new setting the thread request successfully completing the thread request even if the exception is logged.

I think to understand that into the pool sometimes it seems there are connections died and the client search and try another connection alive into the pool retrying the GetOperation.

As written inorg.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation class code into library hot-rod:

/**
 * Base class for all the operations that need retry logic: if the operation fails due to connection problems, try with
 * another available connection.
 *
 * @author [email protected]
 * @since 4.1
 */

In a nutshell I need to keep alive connections into the pool, I don't know which is the property for Data Grid client HOT-ROD. The next step I would like to try to set infinispan.client.hotrod.tcp_keep_alive=true to test if the behavior is changed, after that I will update the post.

0
On

I tried to change some configuration properties, considering the reasonable trade-off beetween retry operations and timeouts.

HOT-ROD client config:

infinispan.client.hotrod.server_list=server1:11222;server2:11222;server3:11222;server4:11222
infinispan.client.hotrod.auth_username=app-user
infinispan.client.hotrod.auth_password=password
infinispan.client.hotrod.auth_realm=default
infinispan.client.hotrod.sasl_mechanism=SCRAM-SHA-512


infinispan.client.hotrod.socket_timeout=5000
infinispan.client.hotrod.connect_timeout=20000
infinispan.client.hotrod.max_retries=5

# Connection pool
infinispan.client.hotrod.connection_pool.max_active=100
infinispan.client.hotrod.connection_pool.exhausted_action=WAIT
infinispan.client.hotrod.connection_pool.max_wait=20000
infinispan.client.hotrod.connection_pool.min_idle=50
#default is 1800000
infinispan.client.hotrod.connection_pool.min_evictable_idle_time=600000
infinispan.client.hotrod.connection_pool.max_pending_requests=846

infinispan.client.hotrod.client_intelligence=HASH_DISTRIBUTION_AWARE
infinispan.client.hotrod.force_return_values=false

#default is false
infinispan.client.hotrod.tcp_keep_alive=true 

The changes have been done are:

  • infinispan.client.hotrod.socket_timeout reduced to 5000ms
  • infinispan.client.hotrod.connection_pool.min_idle incremented to 50
  • infinispan.client.hotrod.connect_timeout reduced to 20000ms
  • infinispan.client.hotrod.max_retries reduced to 5
  • infinispan.client.hotrod.connection_pool.max_active incremented to 100

Tested in QA env and from log application, I can say the HOT-ROD client rarely after 4 RetryOnFailureOperations obtained

java.net.SocketTimeoutException: GetOperation`timed out after 5000 ms

occurs the updateTopologyInfo action client HOT-ROD side. This HOT-ROD client policy it seems to be only the log generated from the Async thread runs in background to check the sockets are still alive.

[2/23/22 9:06:12:923 CET] 000001ab HOTROD        W org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation handleException ISPN004098: Closing connection [id: 0xa5345361, L:/10.0.20.160
:44021 - R:10.0.18.102/10.0.18.102:11222] due to transport error
                                 java.net.SocketTimeoutException: GetOperation{MY-CACHE, key=[B0x4A4B636F6D2E6475636174692E77612E..[77], flags=1, connection=10.0.18.102/10.0.18.102:11222} timed out a
fter 5000 ms
        at org.infinispan.client.hotrod.impl.operations.HotRodOperation.run(HotRodOperation.java:185)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.lang.Thread.run(Thread.java:822)

[2/23/22 9:06:17:924 CET] 000001a5 HOTROD        W org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation handleException ISPN004098: Closing connection [id: 0x54dacb9b, L:/10.0.20.160:42115 - R:10.0.18.97/10.0.18.97:11222] due to transport error
                                 java.net.SocketTimeoutException: GetOperation{MY-CACHE, key=[B0x4A4B636F6D2E6475636174692E77612E..[77], flags=1, connection=10.0.18.97/10.0.18.97:11222} timed out after 5000 ms
        at org.infinispan.client.hotrod.impl.operations.HotRodOperation.run(HotRodOperation.java:185)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.lang.Thread.run(Thread.java:822)

[2/23/22 9:06:22:925 CET] 000001a5 HOTROD        W org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation handleException ISPN004098: Closing connection [id: 0xe3e6fd98, L:/10.0.20.160:43629 - R:10.0.18.101/10.0.18.101:11222] due to transport error
                                 java.net.SocketTimeoutException: GetOperation{MY-CACHE, key=[B0x4A4B636F6D2E6475636174692E77612E..[77], flags=1, connection=10.0.18.101/10.0.18.101:11222} timed out after 5000 ms
                                     at org.infinispan.client.hotrod.impl.operations.HotRodOperation.run(HotRodOperation.java:185)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.lang.Thread.run(Thread.java:822)

[2/23/22 9:06:27:926 CET] 000001ab HOTROD        W org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation handleException ISPN004098: Closing connection [id: 0x52a4bd4a, L:/10.0.20.160:32912 - R:10.0.18.96/10.0.18.96:11222] due to transport error
                                 java.net.SocketTimeoutException: GetOperation{MY-CACHE, key=[B0x4A4B636F6D2E6475636174692E77612E..[77], flags=1, connection=10.0.18.96/10.0.18.96:11222} timed out after 5000 ms
        at org.infinispan.client.hotrod.impl.operations.HotRodOperation.run(HotRodOperation.java:185)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.lang.Thread.run(Thread.java:822)

[2/23/22 9:06:27:927 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory updateTopologyInfo ISPN004014: New server added(server4:11222), adding to the pool.
[2/23/22 9:06:27:927 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory updateTopologyInfo ISPN004014: New server added(server2:11222), adding to the pool.
[2/23/22 9:06:27:928 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory updateTopologyInfo ISPN004014: New server added(server3:11222), adding to the pool.
[2/23/22 9:06:27:928 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory updateTopologyInfo ISPN004014: New server added(server1:11222), adding to the pool.
[2/23/22 9:06:27:928 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(10.0.18.101:11222), removing from the pool.
[2/23/22 9:06:27:928 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(10.0.18.102:11222), removing from the pool.
[2/23/22 9:06:27:928 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(10.0.18.96:11222), removing from the pool.
[2/23/22 9:06:27:929 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(10.0.18.97:11222), removing from the pool.
[2/23/22 9:06:27:929 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(10.0.18.101:11222), removing from the pool.
[2/23/22 9:06:27:929 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(10.0.18.102:11222), removing from the pool.
[2/23/22 9:06:27:929 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(10.0.18.96:11222), removing from the pool.
[2/23/22 9:06:27:929 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(10.0.18.97:11222), removing from the pool.
[2/23/22 9:06:28:059 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.protocol.Codec20 readNewTopologyAndHash ISPN004006: Server sent new topology view (id=108, age=0) containing 4 addresses: [10.0.18.101:11222, 10.0.18.102:11222, 10.0.18.96:11222, 10.0.18.97:11222]
[2/23/22 9:06:28:060 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory updateTopologyInfo ISPN004014: New server added(10.0.18.101:11222), adding to the pool.
[2/23/22 9:06:28:062 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory updateTopologyInfo ISPN004014: New server added(10.0.18.102:11222), adding to the pool.
[2/23/22 9:06:28:063 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory updateTopologyInfo ISPN004014: New server added(10.0.18.96:11222), adding to the pool.
[2/23/22 9:06:28:065 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory updateTopologyInfo ISPN004014: New server added(10.0.18.97:11222), adding to the pool.
[2/23/22 9:06:28:082 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(server4:11222), removing from the pool.
[2/23/22 9:06:28:083 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(server2:11222), removing from the pool.
[2/23/22 9:06:28:083 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(server3:11222), removing from the pool.
[2/23/22 9:06:28:083 CET] 000001ab HOTROD        I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory closeChannelPools ISPN004016: Server not in cluster anymore(server1:11222), removing from the pool.
16576429:[2/23/22 9:06:33:056 CET] 000001ab HOTROD        W org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation handleException ISPN004098: Closing connection [id: 0x4299851e, L:/10.0.20.160:45567 ! R:server4/10.0.18.102:11222] due to transport error
16576703-                                 java.net.SocketTimeoutException: GetOperation{MY-CACHE, key=[B0x4A4B636F6D2E6475636174692E77612E..[77], flags=1, connection=server4/10.0.18.102:11222} timed out after 5000 ms
16576925-       at org.infinispan.client.hotrod.impl.operations.HotRodOperation.run(HotRodOperation.java:185)
16577020-       at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
16577090-       at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
16577173-       at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
16577268-       at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
16577371-       at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
16577442-       at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
16577539-       at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
16577617-       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
16577701-       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
16577785-       at java.lang.Thread.run(Thread.java:822)

NOTE

Character after HOTROD word indicates the LOG level

Example:

[2/23/22 9:06:28:083 CET] 000001ab HOTROD I org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory

  • I = INFO
  • W = WARNING
  • D = DEBUG
  • E = ERROR