Infinispan failover capability

352 Views Asked by At

I'm trying to leverage Infinispan's distributed task failover and I can't seem to get it to work. A little background on what I am trying to do:

I have two server nodes in a cluster. These two nodes share a distributed cache and the cache contains information used to run a task. I am trying to implement a failover feature on them so that if a task is running on server #1 and it goes down, server #2 will be able to pick up the task and complete it.

1) I first create a DistributedCallable object: (MyJob) MyJob myJob = new MyJob(param1, param2);

2) Then I create a DistributedExecutorService and a DistributedTaskBuilder and configure it with the provided Infinispan random node failover policy:

DistributedExecutorService execService =
        new DefaultExecutorService(cacheManager.getCache());
DistributedTaskBuilder<Boolean> taskBuilder = 
        execService.createDistributedTaskBuilder(myJob);
taskBuilder = taskBuilder.failoverPolicy(DefaultExecutorService.RANDOM_NODE_FAILOVER);

3) I build the distributedtask and then run it using the DistributedExecutorService:

DistributedTask<Boolean> distTask = taskBuilder.build();
execService.submit(distTask);

During my tests, I do see that the DistributedTask gets sent to either server #1 or server #2, and that the distributed cache is being updated properly by both servers. However, when I try testing for failover, it doesn't seem to work.

For example: When the task is running on server #1 (I set the task to sleep for about ~20 secs), I kill server #1, yet I don't see the task being re-run or picked up by server #2. Vice versa.

I'm not sure if I'm missing anything, I've done this according to the Infinispan 7.0.x user guide. In order for failover to work do I need to use one of the provided server modules (Hot Rod/Memcached/REST Server/WebSocket Server)? I'm using Infinispan in its embedded mode (in the actual application) and the documentation makes it seem like using the Distributed Execution Framework should provide failover.

Any help would be much appreciated!

1

There are 1 best solutions below

1
On

It seems that in order to apply the failover policy, you need to wait for the result of such task:

DistributedTask distTask = taskBuilder.build();
Future future = execService.submit(distTask);
Object ignoredReturnValue = future.get();

The downside is that also that if the originator node (where you call this code) crashes, the failover policy cannot be applied.