I Have a requirement to read a Huge Flat File, without keeping the entire file in memory. It is flat file with multiple segments, each record starting with a Header record identified by 'H' in the beginning followed by many lines and then again Header record, this pattern repeats For e.g.
HXYZ CORP 12/12/2016
R1 234 qweewwqewewq wqewe
R1 234 qweewwqewewq wqewe
R1 234 qweewwqewewq wqewe
R2 344 dfgdfgdf gfd df g
HABC LTD 12/12/2016
R1 234 qweewwqewewq wqewe
R2 344 dfgdfgdf gfd df g
HDRE CORP 12/12/2016
R1 234 qweewwqewewq wqewe
R2 344 dfgdfgdf gfd df g
R2 344 dfgdfgdf gfd df g
I want to read a record set at a time for e.g.
HDRE CORP 12/12/2016
R1 234 qweewwqewewq wqewe
R2 344 dfgdfgdf gfd df g
R2 344 dfgdfgdf gfd df g
How can i achieve this keep in mind that i do not want to keep the entire file in memory Is there any standard library that i can use for this purpose? I have tried using some implementations without much success, i have used Apache's Line Iterator , but that reads line by line.
Any help or suggestions will be much appreciated.
A library for that purpose is BeanIO
There are a lot of unsupported libraries for fixed file format out there.
Flatpack is more recent, but I didn't try it.