Wildfly 26.1 Cluster 'Initial state transfer timed out for cache'

151 Views Asked by At

I am trying to configure wildfly 26.1 with High Availability as mentioned here with 2 servers, server A and server B and a Haproxy in front of them.

I'm using the default configuration for standalone-ha.xml and everything is working fine. When the 2 servers are up and running and when server B is down the Haproxy send the request to server A and the user continues to work in server A without losing the session.

The problem occurs when Server A has a lot of sessions and server B is deploying the app and trying to get the current sessions from Server A.

The error is

2023-07-10 14:42:42,947 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 86) MSC000001: Failed to start service org.wildfly.clustering.infinispan.cache.web."app.war": org.jboss.msc.service.StartException in service org.wildfly.clustering.infinispan.cache.web."app.war": org.infinispan.commons.CacheException: Initial state transfer timed out for cache app.war on serverB
    at [email protected]//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:66)
    at [email protected]//org.wildfly.clustering.service.AsyncServiceConfigurator$AsyncService.lambda$start$0(AsyncServiceConfigurator.java:117)
    at [email protected]//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
    at [email protected]//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1990)
    at [email protected]//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
    at [email protected]//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
    at java.base/java.lang.Thread.run(Thread.java:829)
    at [email protected]//org.jboss.threads.JBossThread.run(JBossThread.java:513)
Caused by: org.infinispan.commons.CacheException: Initial state transfer timed out for cache app.war on serverB
    at [email protected]//org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:249)
    at [email protected]//org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1018)
    at [email protected]//org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:512)
    at [email protected]//org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:698)
    at [email protected]//org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:644)
    at [email protected]//org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:533)
    at [email protected]//org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:511)
    at [email protected]//org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:85)
    at [email protected]//org.wildfly.clustering.infinispan.spi.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:77)
    at [email protected]//org.wildfly.clustering.infinispan.spi.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:55)
    at [email protected]//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:63)
    ... 7 more

This error happend when server want more than 4 minutes to take the session from the other server.I have read here that the default timeout is 4 minutes.

I also find that the default value in state transfer is 4 minutes here.

But when I increased it, nothoing different happend.

This is my configuration at this time.

<subsystem xmlns="urn:jboss:domain:infinispan:13.0">
            <cache-container name="ejb" default-cache="dist" marshaller="PROTOSTREAM" aliases="sfsb" modules="org.wildfly.clustering.ejb.infinispan">
                <transport lock-timeout="960000"/>
                <replicated-cache name="sso">
                    <locking isolation="REPEATABLE_READ"/>
                    <transaction mode="BATCH"/>
                    <expiration interval="0"/>
                    <state-transfer timeout="1360000"/>
                </replicated-cache>
                <distributed-cache name="dist">
                    <locking acquire-timeout="10000" isolation="REPEATABLE_READ"/>
                    <transaction mode="BATCH"/>
                    <expiration interval="1000" lifespan="10000" max-idle="10000"/>
                    <file-store/>
                </distributed-cache>
            </cache-container>
.......
</subsystem>

Related post is that but it doesn't solve my problem

0

There are 0 best solutions below