why common-io tool IOUtils.toByteArray is not same?

277 Views Asked by At

Why aren't the results the same when using commons.io.IOUtils to get byte[]?

The toByteArray method params are Inputstream and Reader.

String file = "c:/c.pdf";

try (InputStream is = new FileInputStream(file)) {
    byte[] result = IOUtils.toByteArray(is);
    System.err.println(Arrays.toString(result));
} catch (Exception e) {
    e.printStackTrace();
}

try (Reader reader = new FileReader(file)) {
    byte[] result = IOUtils.toByteArray(reader,"gbk");
    System.err.println(Arrays.toString(result));
} catch (Exception e) {
    e.printStackTrace();
}
1

There are 1 best solutions below

0
On

Short answer: the two results are different because the 2nd solution is wrong. Never use a Reader to read binary data.

An InputStream reads the bytes of a file without trying to give them any meaning; a Reader, on the contrary, tries to convert them to characters using a specific charset: your 2nd example reads bytes, converts them to characters and then the toByteArray() method converts these characters back to bytes BUT this double conversion is not only unuseful (obvious), it's quite wrong because the first conversion may fail: when the Reader encounters a byte (or a group of bytes in case of multi-byte charsets like GBK) that has no associated character it returns a question mark character and when you convert these question marks back to bytes you get the byte value corresponding to the question mark not the original value that failed the conversion.

So the problem is not in IOUtils, it is in your usage of a Reader for reading a PDF.