Pods can't pull image from GCR after configuring google cloud sql proxy

980 Views Asked by At

I have a simple application (REST apis based on python and flask) that works well on Google kubernetes engine (GKE). My CI/CD setups create a docker image, push it to Google cloud registry (GCR) and then deploy it to GKE. Everything works well. Now, I added a database. It will be hosted on Google cloud SQL. To accees the database from kubernetes, I'm using google cloud sql proxy (as a side car) and workload identity as recommended by google.

My problem is, after configuring cloud sql proxy, I'm getting this error:

ImagePullBackOff: Cannot pull image 'gcr.io/xxx-project/xxx-image:xxx-tag' from the registry.

the cloud sql proxy image is loaded correctly (I think because it's hosted in a public registry), but not my image, so the pod keeps crashing.

Something I missed? should I add docker credentials? It's weird because it was working before setting the cloud proxy!!

Many thanks for your help,

Best regards

1

There are 1 best solutions below

0
Tom Greenwood On

I think there's something important to understand here and it's that Autopilot doesn't use Workload Identity or anything to do with the pod's permissions to pull images. It uses the default compute service account for your project.

It is the nodes that need permission to pull images, not the pods. See this note from the GCP documentation on Workload Identity.

Note: Even with Workload Identity enabled, GKE still uses the configured Google Service Account for the node pool to pull container images from the image registry. If you encounter ImagePullBackOff or ErrImagePull errors, check the troubleshooting documentation.

I had the same thing happen to me and it turned out that the default compute service account had been deleted. It restored it (using these instructions Deleted Compute Engine default service account) and gave it storage.admin permissions and that resolved the issue.