I would like to read the contents of a GCS blob in chunks of a specified size. I wrote a test where I wanted to retrieve the last 5000 bytes of a 10,000 byte file stored as a GCS blob. The file consisted of 1K of "0"s, followed by 1K of "1"s, 1K of "2"s, ..., 1K of "9"s.
public void testDownloadBytes(Blob blob) throws IOException {
// 10K file - 1K of 0s, followed by 1K of 1s, 1K of 2s, ...
ReadChannel reader = blob.reader();
ByteBuffer byteBuf = ByteBuffer.allocate(10_000);
reader.seek(5000);
reader.setChunkSize(10_000);
int numRead = reader.read(byteBuf);
logger.info("read '" + numRead + " bytes");
byte[] bytes = byteBuf.array();
String s = new String(bytes, StandardCharsets.UTF_8);
logger.info("downloaded '" + s + "'");
}
Because I started at byte 5000, and asked to retrieve 10,000 bytes, I expected to only read the last 5000 bytes. However, the number of bytes read was 10,000. The first 5000 bytes were what I expected, starting with the "5"s. The interesting part was that the last 5000 bytes consisted of a <CR><LF>, followed by the beginning of the file - 1K of "0"s, ..., 998 "4"s. Why did that happen, and what can I do to only retrieve the last 5000 bytes?