Using NIO vs RandomAccessFile to read chunks of files

1k Views Asked by At

I want to read a large text file about several GBs and process it without loading the whole file but loading chunks of it.(Processing involves counting word instances)

If I'm using a concurrent hash map to process the file in parallel to make it more efficient, is there a way to use NIO or random access file to read it in chunks? Would it make it even more efficient?

The current implementation is using a buffered reader that goes something like this:

while(lines.size() <= numberOfLines && (line = bufferedReader.readLine()) != null) {
     lines.add(line);
}

lines.parallelStream().. // processing logic using ConcurrentHashMap
2

There are 2 best solutions below

2
On BEST ANSWER

RandomAccessFile makes only sense if you intend to "jump" around within the file and your description of what you're doing doesn't sound like that. NIO makes sense if you have to cope with lots of parallel communication going on and you want to do non-blocking operations e.g. on Sockets. That as well doesn't seem to be your use case.

So my suggestion is to stick with the simple approach of using a BufferedReader on top of a InputStreamReader(FileInputStream) (don't use FileReader because that doesn't allow you to specify the charset/encoding to be used) and go through the data as you showed in your sample code. Leave away the parallelStream, only if you see bad performance you might try that out.

Always remember: Premature optimization is the root of all evil.

1
On

The obvious java 7 Solution is :

 String lines = Files.readAllLines(Paths.get("file"), StandardCharsets.UTF_8).reduce((a,b)->a+b);  

Honestly I got no Idea if it is faster but I gues under the hood it does not read it into a buffer so at least in theory it should be faster