Spring batch Partitioner : How to make a spring boot app having spring batch functionality work as multinode

89 Views Asked by At

We have a spring boot application which is designed specifically to handle spring batch jobs. Here we are using spring batch partioner approach. This approach was chosen as we needed the resumability/restartability of a batch on failure in a single node. Reference : Restarting a job after server crash is not processing the unprocessed data.

Now we need to make our application multi node by pointing our multiple copys of this spring boot servers to same database. So is there a provision now in spring batch which will make jobs node aware. Meaning if the job is being processed by node1 and if node 1 goes down, node 2 needs to pick it from where node1 had left.

Update : 08-Feb-2024, @Mahmoud Ben Hassine, Thanks for the information. Please provide more details,

  1. When you say a centralized shared job repository, you are referring to multiple nodes pointing to same Database(Which contains batch tables) right?
  2. Can we achieve automatic restarting ?That is, if node 1 goes down while processing a batch another node to pick it up automatically without needing a manual intervention to restart?
  3. Finally how to maintain or fetch information on which node is processing which batch when batches are triggered through load-balancer and each batch trigger can go to different nodes?
1

There are 1 best solutions below

7
Mahmoud Ben Hassine On

So is there a provision now in spring batch which will make jobs node aware. Meaning if the job is being processed by node1 and if node 1 goes down, node 2 needs to pick it from where node1 had left.

Yes, this is possible by design thanks to the centralized, shared job repository. If one partition fails on one node and the job is restarted, the failed partition will be restarted where it left off even on a different node. This is possible since restart data for all partitions is available in the centralized job repository, so any healthy worker node can pick up any failed partition where it left off.

EDIT: update answer

  1. Yes
  2. No, you have to restart the failed job yourself (or write some code to monitor the executions and restart failed ones)
  3. You should not do that. I would recommend workers location/assignment to be transparent (otherwise it would hinder scalability).