Google Cloud Bucket File download (NodeJS). File Exists but Readable throws error and makes express server restart

467 Views Asked by At

In our express server using @google-cloud/storage, in some cases while downloading a file from our buckets through a stream readable, even subscribing to the .error in the stream (which does not get called), I get an error, even though this call is wrapped into a try-catch, and our express instance restarts, not killing the pod but restarting express itself, very weird.

The error I get from the express logs looks like this:

TypeError: Cannot read properties of null (reading 'length')
    at getStateLength (/usr/src/node_modules/stream-shift/index.js:16:28)
    at shift (/usr/src/node_modules/stream-shift/index.js:6:99)
    at Duplexify._forward (/usr/src/node_modules/duplexify/index.js:170:35)
    at PassThrough.onreadable (/usr/src/node_modules/duplexify/index.js:136:10)
    at PassThrough.emit (node:events:518:28)
    at emitReadable_ (node:internal/streams/readable:832:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:81:21)

I get another trace that says it blew up at this point:

return state.buffer[0].length

Which seems to correspond to this part of stream-shift code: https://github.com/mafintosh/stream-shift/blob/2ea5f7dcd8ac6babb08324e6e603a3269252a2c4/index.js#L16C1-L16C34

My download code looks like this:

  const { bucketName, keyFilename } = config.google.storage;
  if (!bucketName) {
    throw badImplementation('config.google.storage.bucketName is undefined');
  }
  if (!keyFilename) {
    throw badImplementation('config.google.storage.keyFilename is undefined');
  }

  const storage = new Storage({
    keyFilename,
    retryOptions: { autoRetry: true, maxRetries: 1 },
  });
  const bucket = storage.bucket(bucketName);

  const [exists] = await bucket.file(name).exists();
  if (!exists) {
    const error = `CDN download, file ${name} does not exist`;
    console.log(error);
    throw notFound(error);
  }

  log.info(`CDN download, create read stream on ${name} begin`);
  const readStream = bucket
    .file(name)
    .createReadStream()
    .on('response', (response) => {
      // Server connected and responded with the specified status and headers.
      console.log(`CDN download, stream on file ${name}, response is: ${JSON.stringify(response)}`);
    })
    .on('end', () => {
      // The file is fully downloaded.
      console.log(`CDN download, stream on file ${name}, file fully downloaded`);
    })
    .on('error', (err) => {
      // Something happened while downloading the file
      console.log(`CDN download, stream on file ${name}, error is: ${JSON.stringify(err)}`);
    });

  log.info(`CDN download, create read stream on ${name} done`);
  return readStream;

I thought the file could not exist but I added the check of .exists() which returns true and therefore creates the readStream().

I even get a trace from the .on('response' part identifying the file.

{
  "headers": {
    "cache-control": "no-cache, no-store, max-age=0, must-revalidate",
    "content-disposition": "attachment",
    "content-length": "1309467",
    "content-type": "application/octet-stream",
    "date": "Wed, 24 Jan 2024 11:17:05 GMT",
    "etag": "CLSOmpWa8oMDEAE=",
    "expires": "Mon, 01 Jan 1990 00:00:00 GMT",
    "last-modified": "Tue, 23 Jan 2024 00:00:33 GMT",
    "pragma": "no-cache",
    "server": "UploadServer",
    "vary": "Origin, X-Origin",
    "x-goog-generation": "1705968033761076",
    "x-goog-hash": "crc32c=EeUAng==,md5=Duc9MjxstOaEXhEeZRphIw==",
    "x-goog-metageneration": "1",
    "x-goog-storage-class": "STANDARD",
    "x-goog-stored-content-encoding": "identity",
    "x-goog-stored-content-length": "1309467",
    "x-guploader-uploadid": "ABPtcPpJ0EZifzef-2dHFzbfURL0E_niJIylxjegZyJhjJ0kyhM8FGb7jymom35PJ4UrOcti3mp8CxNuqw"
  }
}

Could it be that even though the client checks say file exists we don't have permissions to download the file?

UPDATE 1: After further investigation and rolling back our docker images we discovered that between Jan 9 and Jan 11 our base image node:20 pushed a change which seems to be the issue: https://github.com/nodejs/docker-node/commit/ab5769dc69feb4007d9aafb03316ea0e3edb4227

This changed from node 20.10 to 20.11 and it's the only possible explanation to have something like this happening, is there any known issue reported?

UPDATE2: Workaround from docker image node:20 to node:20.10.0 solved the issue, something must've been introduced in node:20.11.0 (aka latest), could anybody from node or google investigate what is going on?

1

There are 1 best solutions below

1
nico On

Per this GitHub comment thread, besides downgrading to Node 20.11.0 you can alternatively use a package override to use stream-shift version 1.0.2 which should resolve this issue.

Hopefully that gets updated in the @google-cloud/storage package soon as well.