How to read a specific range of bytes from a GCS Blob?

675 Views Asked by At

I would like to read the contents of a GCS blob in chunks of a specified size. I wrote a test where I wanted to retrieve the last 5000 bytes of a 10,000 byte file stored as a GCS blob. The file consisted of 1K of "0"s, followed by 1K of "1"s, 1K of "2"s, ..., 1K of "9"s.

public void testDownloadBytes(Blob blob) throws IOException {
    // 10K file - 1K of 0s, followed by 1K of 1s, 1K of 2s, ...
    ReadChannel reader = blob.reader();
    ByteBuffer byteBuf = ByteBuffer.allocate(10_000);
    reader.seek(5000);
    reader.setChunkSize(10_000);
    int numRead = reader.read(byteBuf);
    logger.info("read '" + numRead + " bytes");
    byte[] bytes = byteBuf.array();
    String s = new String(bytes, StandardCharsets.UTF_8);
    logger.info("downloaded '" + s + "'");
}

Because I started at byte 5000, and asked to retrieve 10,000 bytes, I expected to only read the last 5000 bytes. However, the number of bytes read was 10,000. The first 5000 bytes were what I expected, starting with the "5"s. The interesting part was that the last 5000 bytes consisted of a <CR><LF>, followed by the beginning of the file - 1K of "0"s, ..., 998 "4"s. Why did that happen, and what can I do to only retrieve the last 5000 bytes?

0

There are 0 best solutions below