Custom Image Pulled Everytime in Google Dataproc Serverless

544 Views Asked by At

I am using the custom image in the Dataproc Serverless. When I execute job, it is pulling image every time. This adds 1 mins extra processing time. We will be executing 1000 plus job in production and it will add lot of performance bottle neck.

Is there anyway we can tell Dataproc to cache image such that it does not pull every time?

Pulling image us.gcr.io/docker_image:version
About to run 'docker pull us.gcr.io/docker_image:version' with retries...
1.5: Pulling from docker_image
5eb5b503b376: Already exists
7967823e23a4: Pulling fs layer
8d68a13eb796: Pulling fs layer
72ed51b4aa20: Pulling fs layer
7967823e23a4: Download complete
7967823e23a4: Pull complete
8d68a13eb796: Verifying Checksum
8d68a13eb796: Download complete
8d68a13eb796: Pull complete
72ed51b4aa20: Download complete
72ed51b4aa20: Pull complete

1

There are 1 best solutions below

0
On

Not yet, this is WIP and should be available in couple months.

Update: image streaming support for container images hosted in Google Artifacts Registry were released to GA on October 1st 2022.