GCP Cloud Run container stalls when using Cloud Build to deploy with tag "latest" and same image name

78 Views Asked by At

I have spent the last few days investigating this issue and tried many things. The issue started around February 7th, 2024.

The summary of the issue is:

  • When I deploy a replacement service to GCP Cloud Run using Cloud Build and container registry, my container stalls and is forcibly SIGTERM'ed by Cloud Run for exceeding maximum request timeout.
  • I was able to pinpoint the point of the stall in my container to a function that uses the GCP Spanner python package, so I believe it may be authentication related.
  • When using GCP console and reverting to the previous working revision, the container worked as expected.
  • On a hunch I deployed the same container to a different service name + image name with tag "latest" in container registry, which worked. Container did not time out and all GCP APIs worked as expected.
  • In the same vein, I tried redeploying to the new service name using a different image name with the tag "latest", which worked as well.

The format of my Cloud Build .yaml file is:

steps:
 - name: 'gcr.io/cloud-builders/docker'
  entrypoint: 'bash'
  args: ['-c', 'docker build --build-arg=GIT_ACCESS_TOKEN=$$_GIT_ACCESS_TOKEN -t gcr.io/myproject/containername:latest .']
  secretEnv: ['_GIT_ACCESS_TOKEN']
 - name: 'gcr.io/cloud-builders/docker'
  args: ['push', 'gcr.io/myproject/containername:latest']
 - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
  entrypoint: gcloud
  args: ['run', 'deploy', 'container name', '--image', 'gcr.io/myproject/containername:latest', '--region', 'us-central1']
availableSecrets:
  secretManager:
 - versionName: projects/myproject/secrets/git_access_token_my_repo/versions/latest
    env: '_GIT_ACCESS_TOKEN'
images:
 - 'gcr.io/myproject/containername:latest'

Our team has been using this exact format for over a year with no issues. Whenever we deploy a service we would run this gcloud command and wouldn't make any changes to the .yaml file:

gcloud builds submit --region=us-central1 --config cloudbuild.yaml

However that no longer works, and when we deploy the service now we modify the .yaml file like this (different image name):

steps:
 - name: 'gcr.io/cloud-builders/docker'
  entrypoint: 'bash'
  args: ['-c', 'docker build --build-arg=GIT_ACCESS_TOKEN=$$_GIT_ACCESS_TOKEN -t gcr.io/myproject/containernamev2:latest .']
  secretEnv: ['_GIT_ACCESS_TOKEN']
 - name: 'gcr.io/cloud-builders/docker'
  args: ['push', 'gcr.io/myproject/containernamev2:latest']
 - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
  entrypoint: gcloud
  args: ['run', 'deploy', 'container name', '--image', 'gcr.io/myproject/containernamev2:latest', '--region', 'us-central1']
availableSecrets:
  secretManager:
 - versionName: projects/myproject/secrets/git_access_token_my_repo/versions/latest
    env: '_GIT_ACCESS_TOKEN'
images:
 - 'gcr.io/myproject/containernamev2:latest'

My questions are:

  • Why doesn't the old .yaml file work anymore? Does it have something to with the deprecation of container registry?
  • If it's permissions related how can I check this?
  • Was there some change that happened around February 7th?

Dockerfile for additional context:

FROM python:3.9-slim
ARG GIT_ACCESS_TOKEN

RUN apt-get update \
&& apt-get install gcc -y \
&& apt-get clean \
&& apt-get install -y git

RUN git config --global url."https://${GIT_ACCESS_TOKEN}@github.com".insteadOf "ssh://[email protected]"

ENV PYTHONUNBUFFERED True
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./

RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir git+ssh://[email protected]/myorg/my-repo.git@main#subdirectory=python_packages/src/package-one
RUN pip install --no-cache-dir git+ssh://[email protected]/myorg/my-repo.git@main#subdirectory=python_packages/src/package-two

CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app
0

There are 0 best solutions below