How to parse large files using flatpack

817 Views Asked by At

I need to parse files that may be quite large, possibly 100s of megabytes and millions of lines. I have been trying to do this using FlatPack. I would think the way to do this would be to use the buffered parsers and the new stream methods. But, despite that dataset.next() returns true for the correct number of records, the Optional returned by dataset.getRecord() never contains a value.

I have looked at this example/test but it only counts the number of record and does not actually do anything with the content. example/test

2

There are 2 best solutions below

1
On

You can use the class BuffReaderParseFactory instead of DefaultParserFactory.

It will read one record from the input file only when you call "next()".

0
On

The explanations for both DefaultParserFactory and BuffReaderParseFactory are not exactly helpful. Both libraries said to return PZParser (from newDelimitedParser) but only one of them returns an actual value from a record. Based on the examples I've seen, I think BuffReaderParseFactory is just for checking performance (hence should be faster) and DefaultParserFactory on the other hand contains all the records.