What is the requirement?
I'm using the Spring batch application for Bulk File Processing.
Steps:
- Reading the file using network calls.
- Preparting a bulk JSON Payload and calling endpoint.
- Writing responses to files.
What is going wrong?
Everything is going well, however is very slow. Usually taking a small file with 25K records 16 minutes to process with below steps:
- When the reader calls the Processor it blocks and awaits.
- Once the response gets ready then block again for write operation using slow IO.
Suppose:
- The time taken for reading and preparing for JSON: 2s [READER]
- The time taken for requesting: 2s [Processor]
- The time taken for writing: 1s [Writer]
Single Threaded Block Calls: READER --> Processor --> Writer. //Total 5 seconds per request.
How I want to process?
Multi-Threaded Block Calls:
|- - > Processor - -> Writer.
READER -->|- - > Processor - -> Writer.
|- - > Processor - -> Writer.
Configurations used:
@Bean
public PlatformTransactionManager transactionManager() {
return new JpaTransactionManager();
}
@Bean
@Autowired
Step step1(JobRepository jobRepository) {
PlatformTransactionManager transactionManager = transactionManager();
StepBuilder stepBuilder = new StepBuilder("CARD TRANSFORMATION", jobRepository);
return stepBuilder
.<List<FileStructure>, CardExtractOutputList>chunk(1, transactionManager)
.reader(generalFileReader.reader(""))
.processor(cardExtractFileProcessor)
.writer(cardExtractFileWriter)
.taskExecutor(taskExecutor())
.faultTolerant()
.retryLimit(3)
.retry(RuntimeException.class)
.build();
}
@Bean(name = "jsob")
@Autowired
Job cardExtractFilejob(JobRepository jobRepository) {
JobBuilder jobBuilderFactory =
new JobBuilder("somename", jobRepository)
.incrementer(new RunIdIncrementer())
.listener(preJobListener)
.listener(postJobExecution);
return jobBuilderFactory.flow(step1(jobRepository)).end().build();
}
@Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor asyncTaskExecutor = new SimpleAsyncTaskExecutor();
asyncTaskExecutor.setConcurrencyLimit(10);
return asyncTaskExecutor;
}
Custom Reader:
@Bean
@StepScope
@SneakyThrows
public MultiLinePeekableReader reader(
@Value(FILENAME_JOB_PARAM) final String fileName) {
FlatFileItemReader<FileStructure> itemReader = new FlatFileItemReader<>() {};
final String gcsLocationOfFile =
FilesUtility.getAbsoluteGCSPathOfAFile(fileName, gcsRelatedConfiguration);
final Resource resource = applicationContext.getResource(gcsLocationOfFile);
itemReader.setResource(resource);
itemReader.setName("FileReader : " + fileName);
itemReader.setLineMapper(lineMapper());
itemReader.setStrict(true);
MultiLinePeekableReader multiLinePeekableReader = new MultiLinePeekableReader(fileName);
multiLinePeekableReader.setDelegate(itemReader);
return multiLinePeekableReader;
}
private LineMapper<FileStructure> lineMapper() {
DefaultLineMapper<FileStructure> lineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
..
return lineMapper;
}
}
MultiLinePeekableReader:
public class MultiLinePeekableReader implements ItemReader<List<FileStructure>>, ItemStream {
private SingleItemPeekableItemReader<FileStructure> delegate;
..
@Override
@SneakyThrows
@Bean
@StepScope
public synchronized List<FileStructure> read() {
List<FileStructure> records = null;
int readCount = fileProcessingConfiguration.itemsPerRead();
try {
for (FileStructure line; (line = this.delegate.read()) != null; ) {
seqNo = seqNo.add(new BigInteger(FileProcessingConstants.NUMBER_STRING_ONE));
line.setSequenceNo(seqNo.toString());
line.setMaskedSensitiveData(
FilesUtility.getMaskedSensitiveDataFromData(
line.getSensitiveData(),
fileProcessingConfiguration.leadingPersistCount(),
fileProcessingConfiguration.trailingPersistCount()));
if (readCount == fileProcessingConfiguration.itemsPerRead()) {
records = new ArrayList<>();
records.add(line);
readCount--;
} else {
records.add(line);
readCount--;
FileStructure nextLine = this.delegate.peek();
if (nextLine == null || readCount == 0) {
readCount = fileProcessingConfiguration.itemsPerRead();
return records;
}
}
}
} catch (FlatFileParseException parseException) {
if (records == null) {
records = new ArrayList<>();
}
..
}
return records;
}
@Override
public void close() throws ItemStreamException {
this.delegate.close();
}
@Override
public void open(ExecutionContext executionContext) throws ItemStreamException {
this.delegate.open(executionContext);
}
@Override
public void update(ExecutionContext executionContext) throws ItemStreamException {
this.delegate.update(executionContext);
}
public void setDelegate(FlatFileItemReader<FileStructure> delegate) {
this.delegate = new SingleItemPeekableItemReader<>();
this.delegate.setDelegate(delegate);
}
Answers already read but didn't found helpful:
Spring batch single threaded reader and multi threaded writer
Any help would be really appreciated!
The concurrency model you are looking for is not possible in Spring Batch as it is currently designed. The two single-JVM scaling options are multi-threaded steps, or partitioned steps.
If you really want something like you suggest, you need to provide a custom
ChunkProcessor
that is concurrent, but it will be up to you handle fault-tolerance and restartability.