Spring batch worker pods getting scheduled only 2 worker nodes

251 Views Asked by At

I am running Spring Batch application in Kubernetes environment. The k8s cluster have one master and three worker nodes. I am testing spring batch under high load, which is spawning around 100 worker pods. However, all the 100 pods are coming up only on two out of three worker nodes. No node selector or additional labeling has been done on the nodes.

I have used Spring cloud deployer Kubernetes to create worker pods in Kubernetes.

The versions involved are:

  • Spring Boot: 2.1.9.RELEASE
  • Spring Cloud: 2020.0.1
  • Spring Cloud Deployer: 2.5.0
  • Spring Cloud Task: 2.1.1.RELEASE
  • Kubernetes: 1.21

How can I ensure that worker pods get scheduled on all available worker nodes evenly?

Following is the partition handler implementation responsible for launching the tasks.

@Bean
public PartitionHandler partitionHandler(TaskLauncher taskLauncher, JobExplorer jobExplorer) {

    Resource resource = this.resourceLoader.getResource(resourceSpec);

    DeployerPartitionHandler partitionHandler = new DeployerPartitionHandler(taskLauncher, jobExplorer, resource,
        "worker");

    commandLineArgs.add("--spring.profiles.active=worker");
    commandLineArgs.add("--spring.cloud.task.initialize.enable=false");
    commandLineArgs.add("--spring.batch.initializer.enabled=false");
    commandLineArgs.add("--spring.cloud.task.closecontext_enabled=true");
    commandLineArgs.add("--logging.level.root=DEBUG");

    partitionHandler.setCommandLineArgsProvider(new PassThroughCommandLineArgsProvider(commandLineArgs));
    partitionHandler.setEnvironmentVariablesProvider(environmentVariablesProvider());
    partitionHandler.setApplicationName(appName + "worker");
    partitionHandler.setMaxWorkers(maxWorkers);

    return partitionHandler;
}

@Bean
public EnvironmentVariablesProvider environmentVariablesProvider() {
    return new SimpleEnvironmentVariablesProvider(this.environment);
}
1

There are 1 best solutions below

0
On BEST ANSWER

Posting this out of comments as a community wiki for better visibility, feel free to edit and expand.


There are scheduling mechanics which can prevent scheduling pods on some nodes:

If nothing is set, it's worth trying to rejoin the node. For instance it might not be registered correctly (this solved the issue above).