With reference to this and this Q&A's, given that Java's Scanner supports both character encoding as well as buffering, is there ever a need to wrap the InputStream into a InputStreamReader, before passing it into the Scanner? If so, is there ever a need to wrap the InputStreamReader into a BuffereedReader first?

In other words, is there ever a scenario where the following code is not redundant.

FileInputStream inputStream = new FileInputStream(fileName);
InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
BufferedReader bufferedStreamReader = new BufferedReader(inputStreamReader);
Scanner scanner = new Scanner(bufferedStreamReader);

I am learning Java (from Python), and am trying to better understand the similarities, differences, and use cases for all the various IO classes.

(Edit1): I know that we could replace the first two lines with a single line using FileReader - but that's what the FileReader itself would do internally, so the purpose of the example (and this question) is to understand the underlying relationship of the different IO classes - hence the explicit call to FileInputStream use.

(Edit2): This Java tutorial actually does this wrapping.

1

There are 1 best solutions below

3
On

It's a moot point; you rarely want scanner (it doesn't really do what you think it does. It does what its spec says it does, but, read it. It's weird - it's not particularly suitable for, well, anything, really). And you don't need this code - this is the old file API. There's a new one:

Path p = Path.of("/path/to/some/dir/someFileInDir.txt");
p = p.getParent().resolve("someOtherFile.txt");

try (var in = Files.newBufferedReader(p)) {
  while (true) {
    String line = in.readLine();
    if (line == null) break;
  }
}

You can one-liner it with Files.newBufferedReader(Path.of("/path/to/file.txt")), of course.

For what its worth:

  • FileInputStream does not buffer. If you call .read() (i.e. the 'read 1 byte please' method), it will ask the underlying OS for just one byte. On modern SSDs, that means the OS will read the entire block, then toss everything out except that 1 byte you want. Yes, you should wrap them in buffers if you or any code you use has any intention of calling .read(), but if you're just going to call .read(byte[]) with a byte array that's large enough (4096 or so is plenty, let alone more), there's no need for that.
  • InputStreamReader also doesn't.
  • Scanner does, by default 1024.

So, the answer is: That code you found is mostly useless, but 1024 is a bit low. BufferedInputStream defaults to 8192 instead which is a better number. Still, 1024 gets you ~1024x better performance vs. asking for single bytes from a non-buffered block-based source. Assuming those blocks are 8192 bytes long, Scanner is only ~8x slower without that intermediate buffer. And that's assuming 100% of the bottleneck is the 'read from disk' part. Which it might be, disks are slow.

That's all relevant if you are handrolling your buffered stack which you shouldn't (use Files.newBufferedX - check the API, there's lots there, generally you can just make what you need in one go, though you can't make scanners in one go), and using Scanner which you shouldn't.