Spring batch FlatfileitemReader to read malformed lines

1.6k Views Asked by At

In my project, I am using Spring batch and reading a file using FlatFileItemReader/FieldSetMapper. There is problem with some input files.The lines are cut/malformed for few records.
Assume the input file has 4 columns. few columns not formed properly. Can anyone please helpme in fixing this issue?(I could explain more if needed)
File.csv

"id","name","age","salary"
"1","user1","28","1000"
"2","user2","27","2000"
"3","user3","26
    ","3000"
"4","user4","25","
    4000"
"5","
        user5","24","5000"
"6","user6","23","6000"
"7","user7","22","7000"
"8","user8","21","8000"
1

There are 1 best solutions below

0
On

I had similar issue while reading malformed lines with FlatFileItemReader. In this case, you can use a DefaultRecordSeparatorPolicy as a RecordSeparatorPolicy in FlatFileItemReader. What it does is it checks for endOfRecord after reading a line. If the read line has any uncommented quotes, it reads the another line to normalize the input. You can also override the behavior.

flatFileItemReader.setRecordSeparatorPolicy(new DefaultRecordSeparatorPolicy());

Refer DefaultRecordSeparatorPolicy API Doc for more information

@Bean
public FlatFileItemReader<YourClassName> itemReader(@Value("${input}") Resource resource) {
    FlatFileItemReader<YourClassName> flatFileItemReader = new FlatFileItemReader<>();
    flatFileItemReader.setResource(resource);
    flatFileItemReader.setName("CSV-Reader");
    flatFileItemReader.setLinesToSkip(1);
    // override default comment '#' from file parsing
    flatFileItemReader.setComments(new String[] {});
    // checks for multi-line csv inputs for very lage row
    flatFileItemReader.setRecordSeparatorPolicy(new DefaultRecordSeparatorPolicy());
    flatFileItemReader.setLineMapper(lineMapper());
    return flatFileItemReader;
}

@Bean
public LineMapper<YourClassName> lineMapper() {
    DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
    lineTokenizer.setDelimiter(DelimitedLineTokenizer.DELIMITER_COMMA);
    lineTokenizer.setQuoteCharacter(DelimitedLineTokenizer.DEFAULT_QUOTE_CHARACTER);
    lineTokenizer.setStrict(false);
    lineTokenizer.setNames(COLUMN_NAMES);

    BeanWrapperFieldSetMapper<YourClassName> fieldSetMapper = new BeanWrapperFieldSetMapper<>();
    fieldSetMapper.setTargetType(YourClassName.class);

    DefaultLineMapper<YourClassName> defaultLineMapper = new DefaultLineMapper<>();
    defaultLineMapper.setLineTokenizer(lineTokenizer);
    defaultLineMapper.setFieldSetMapper(fieldSetMapper);
    return defaultLineMapper;
}