I have spent the last few days investigating this issue and tried many things. The issue started around February 7th, 2024.
The summary of the issue is:
- When I deploy a replacement service to GCP Cloud Run using Cloud Build and container registry, my container stalls and is forcibly
SIGTERM'ed by Cloud Run for exceeding maximum request timeout. - I was able to pinpoint the point of the stall in my container to a function that uses the GCP Spanner python package, so I believe it may be authentication related.
- When using GCP console and reverting to the previous working revision, the container worked as expected.
- On a hunch I deployed the same container to a different service name + image name with tag "latest" in container registry, which worked. Container did not time out and all GCP APIs worked as expected.
- In the same vein, I tried redeploying to the new service name using a different image name with the tag "latest", which worked as well.
The format of my Cloud Build .yaml file is:
steps:
- name: 'gcr.io/cloud-builders/docker'
entrypoint: 'bash'
args: ['-c', 'docker build --build-arg=GIT_ACCESS_TOKEN=$$_GIT_ACCESS_TOKEN -t gcr.io/myproject/containername:latest .']
secretEnv: ['_GIT_ACCESS_TOKEN']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/myproject/containername:latest']
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: gcloud
args: ['run', 'deploy', 'container name', '--image', 'gcr.io/myproject/containername:latest', '--region', 'us-central1']
availableSecrets:
secretManager:
- versionName: projects/myproject/secrets/git_access_token_my_repo/versions/latest
env: '_GIT_ACCESS_TOKEN'
images:
- 'gcr.io/myproject/containername:latest'
Our team has been using this exact format for over a year with no issues. Whenever we deploy a service we would run this gcloud command and wouldn't make any changes to the .yaml file:
gcloud builds submit --region=us-central1 --config cloudbuild.yaml
However that no longer works, and when we deploy the service now we modify the .yaml file like this (different image name):
steps:
- name: 'gcr.io/cloud-builders/docker'
entrypoint: 'bash'
args: ['-c', 'docker build --build-arg=GIT_ACCESS_TOKEN=$$_GIT_ACCESS_TOKEN -t gcr.io/myproject/containernamev2:latest .']
secretEnv: ['_GIT_ACCESS_TOKEN']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/myproject/containernamev2:latest']
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: gcloud
args: ['run', 'deploy', 'container name', '--image', 'gcr.io/myproject/containernamev2:latest', '--region', 'us-central1']
availableSecrets:
secretManager:
- versionName: projects/myproject/secrets/git_access_token_my_repo/versions/latest
env: '_GIT_ACCESS_TOKEN'
images:
- 'gcr.io/myproject/containernamev2:latest'
My questions are:
- Why doesn't the old .yaml file work anymore? Does it have something to with the deprecation of container registry?
- If it's permissions related how can I check this?
- Was there some change that happened around February 7th?
Dockerfile for additional context:
FROM python:3.9-slim
ARG GIT_ACCESS_TOKEN
RUN apt-get update \
&& apt-get install gcc -y \
&& apt-get clean \
&& apt-get install -y git
RUN git config --global url."https://${GIT_ACCESS_TOKEN}@github.com".insteadOf "ssh://[email protected]"
ENV PYTHONUNBUFFERED True
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir git+ssh://[email protected]/myorg/my-repo.git@main#subdirectory=python_packages/src/package-one
RUN pip install --no-cache-dir git+ssh://[email protected]/myorg/my-repo.git@main#subdirectory=python_packages/src/package-two
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app