Cloud-Run process fails with 500 status-code and a membarrier gvisor error

1.8k Views Asked by At

Backround

The service is a simple Go program that pipes a file from Cloud Storage to the browser.

Everything works fine on my Macbook, but fails on Cloud-Run (managed) for some requests. Mostly large mp4 files.

Problem

The logs just show a 500 status, as does the browser. But my service doesn't log anything other than starting to copy the file. No IO errors or anything.

This message is shown 4 seconds before the 500 status:

Container Sandbox Limitation: Unsupported syscall membarrier(0x10,0x0,0x0,0x8,0x775dce0b030,0x775dce0b000). Please, refer to https://gvisor.dev/c/linux/amd64/membarrier for more information.

I cannot reproduce this locally. Works fine locally with the same configuration and GCP buckets.

The service works fine on Cloud-Run with smaller files, like images. Just not the videos I've tried.

I've tried

  • Logging everything up to the io.Copy. No errors, hangs afte io.Copy is called.
  • Increasing the memory of the container. It's now running a 1G. No change from 512M.
  • Running in a Docker container locally with the same configuration, same credentials. No problems.
  • Reaching out to GCP on Twitter

Update 2019-08-16

I created a very simple service that prints 'A' to a http responsewriter. It also works perfectly locally, yet returns 500 on cloud-run with larg-ish sizes. 1MB OK, 5MB OK, 50 MB fails, 100MB fails, etc. There are no membarrier messages when this service runs.

Code is available here: https://github.com/andrioid/reproduce-cloud-run-bug

Reported on issue-tracker as well: https://issuetracker.google.com/issues/139511257

Update 2: Probable cause

Seems like there is a hard limit on response sizes to 32MB.

https://cloud.google.com/run/quotas

Very disappointing that this cannot be increased and that the error doesn't mention this limit, neither does the log file.

3

There are 3 best solutions below

0
On

Note that you can always report issues at Google Cloud official issue trackers. https://cloud.google.com/support/docs/issue-trackers.

In most cases, unimplemented system calls in gVisor don't cause crashes in the application (as most languages use fallbacks by using more primitive or legacy syscalls).

I'd recommend following the issue linked at the other answer and reply with saying you hit this on Cloud Run, and ideally provide a small program hitting this case. Such issues are often fixed within a few weeks depending on the release cycles.

It doesn't appear like Go is doing this syscall in its high level code [1] but it might be simply that the low-level Go runtime code written in assembly is causing this.

1
On

There is an outstanding issue at https://github.com/google/gvisor/issues/267 to implement membarrier, but for now this is not allowed by the container sandbox.

1
On

The 32 MB limit for HTTP Request and Response is not a Cloud Run limitation, this is a limitation of the GFE (Global Frontend Service) that sits in front of Cloud Run Managed.

Note: I am not including Cloud Run on Kubernetes in this answer, only Cloud Run Managed.

The GFE is a reverse proxy that terminates TCP connections. The GFE provides additional features to Cloud Run such as public IP hosting of its public DNS name, Denial of Service (DoS) protection, and TLS termination.

The GFE is used for many Google services and for this reason, I doubt that this limitation will be changed in the near future.