Batch job definition: How to run a dynamically-calculated number of partitions?

591 Views Asked by At

As a newbie to the Batch Processing API (JSR-352), I have some difficulties modeling the following (simplified) scenario:

  1. Suppose we have a Batchlet that produces a dynamic set of files in a first step.
  2. In a second step, all these files must be processed individually in chunks (via ItemReader, ItemProcessor and ItemWriter) resulting in a new set of files.
  3. In a third step these new files need to be packaged in one large archive.

I couldn't find a way to define the second step because the specification doesn't seem to provide a loop construct (and in my understanding partition, split and flow only work for a set with a known fixed size).

How could a job xml definition look like? Do I have to give up on the idea of chunking in the second step or do I have to divide the task into multiple jobs? Is there another option?

1

There are 1 best solutions below

6
On BEST ANSWER

You can use a PartitionMapper to programmatically define a dynamic number of partitions for a partitioned step.

The mapper needs to create a PartitionPlan object which sets the number of partitions and provides partition-specific properties for each.

Your mapper's mapPartitions() method will look something like this outline:

public PartitionPlan mapPartitions() throws Exception {

    int numPartitions = // calculate number of partitions, however you want

    // create an array of Properties objects, one for each partition
    Properties[] props = new Properties[numPartitions];

    for (int i = 0; i < numPartitions; i++) {
        // create a Properties object for this partition
        props[i] = new Properties();

        props[i].setProperty("abc", ...);
        props[i].setProperty("xyz", ...);
    }

    // use the built-in PartitionPlanImpl from the spec or your own impl
    PartitionPlan partitionPlan = new PartitionPlanImpl(); 
    partitionPlan.setPartitions(numPartitions);

    // cet the Properties[] onto your plan
    partitionPlan.setPartitionProperties(props);

    return partitionPlan;
}

And then you can reference the partition-specific property values in substitution like this (which is the same way you reference statically-defined partition properties):

    <batchlet ref="myBatchlet">
        <properties>
            <property name="propABC" value="#{partitionPlan['abc']}" />
            <property name="propXYZ" value="#{partitionPlan['xyz']}" />
        </properties>
    </batchlet>