Custom Container Image for Google Dataproc pyspark Batch Job

1.6k Views Asked by At

I am exploring newly introduced the google dataproc serverless. While sumitting job, I want to use custom images (wanted use --container-image argument) such that all my python libraries and related files already present in the server such that job can execute faster .

I have googled and I found only this Dataproc custom images which talks about custom dataproc image. I did not see anything else.

Can you please confirm whether above stated custom image link is right one or is there any other base image we need to use to build container docker image?

1

There are 1 best solutions below

0
On BEST ANSWER

No, above link is for custom VM images for Dataproc on GCE clusters.

To create custom container image for Dataproc Serveless for Spark, please follow the guide.

As a side note, all Dataproc Serverless-related documentation is on the https://cloud.google.com/dataproc-serverless website.