Backround
The service is a simple Go program that pipes a file from Cloud Storage to the browser.
Everything works fine on my Macbook, but fails on Cloud-Run (managed) for some requests. Mostly large mp4 files.
Problem
The logs just show a 500
status, as does the browser. But my service doesn't log anything other than starting to copy the file. No IO errors or anything.
This message is shown 4 seconds before the 500
status:
Container Sandbox Limitation: Unsupported syscall membarrier(0x10,0x0,0x0,0x8,0x775dce0b030,0x775dce0b000). Please, refer to https://gvisor.dev/c/linux/amd64/membarrier for more information.
I cannot reproduce this locally. Works fine locally with the same configuration and GCP buckets.
The service works fine on Cloud-Run with smaller files, like images. Just not the videos I've tried.
I've tried
- Logging everything up to the
io.Copy
. No errors, hangs afteio.Copy
is called. - Increasing the memory of the container. It's now running a 1G. No change from 512M.
- Running in a Docker container locally with the same configuration, same credentials. No problems.
- Reaching out to GCP on Twitter
Update 2019-08-16
I created a very simple service that prints 'A' to a http responsewriter. It also works perfectly locally, yet returns 500 on cloud-run with larg-ish sizes. 1MB OK, 5MB OK, 50 MB fails, 100MB fails, etc. There are no membarrier messages when this service runs.
Code is available here: https://github.com/andrioid/reproduce-cloud-run-bug
Reported on issue-tracker as well: https://issuetracker.google.com/issues/139511257
Update 2: Probable cause
Seems like there is a hard limit on response sizes to 32MB.
https://cloud.google.com/run/quotas
Very disappointing that this cannot be increased and that the error doesn't mention this limit, neither does the log file.
Note that you can always report issues at Google Cloud official issue trackers. https://cloud.google.com/support/docs/issue-trackers.
In most cases, unimplemented system calls in gVisor don't cause crashes in the application (as most languages use fallbacks by using more primitive or legacy syscalls).
I'd recommend following the issue linked at the other answer and reply with saying you hit this on Cloud Run, and ideally provide a small program hitting this case. Such issues are often fixed within a few weeks depending on the release cycles.
It doesn't appear like Go is doing this syscall in its high level code [1] but it might be simply that the low-level Go runtime code written in assembly is causing this.