Dataproc spark job (long running) on cloudrun on Gcp

49 Views Asked by At

I had a small doubt to clarify as an expert advice Can I run dataproc (pyspark job) an an image on cloud run (service or job) as dataproc job may take few minutes to hours to complete so it will asynchronous orchestration and it can be batch or event both either trigger by rest api or by a event from pubsub

Hope you understand my query,

Options (1 and 2 point is more costly compared to 3rd - I feel)

  1. Run dataproc job using composer (composer will be on during idle state with autoscaling also)
  2. Run on gke autopilot (again it will be custom image of spark job)
  3. Run on cloud run (service if finish within 60 mins or as a job if takes few hours) will charge based on number of requests.

Requirement is dataproc spark jobs (workflow-ephemeral) will run demand basis ( it may run based on request per day) hence thinking to use cloud run

In cloudrun jobs can be executed using workflow dependencies/batch scheduling on scheduler/for event can use eventarc

Best architecture decision

0

There are 0 best solutions below