What is the HTTP URL to be given in Cloud scheduler

1k Views Asked by At

I'm very new to Google Cloud Platform. I'm trying to schedule a job using Cloud scheduler where the job has to run which will pick a script called "pipleline.py" in my dataflow console

I'm unable to understand the link, or understanding what should the URL be when creating a cloud scheduler job. Please help me how to go about it

1

There are 1 best solutions below

4
On

There is an excellent example here to schedule Dataflow jobs with Cloud Scheduler. It uses Terraform to create the Cloud Scheduler resource as you can see here:

http_target {
    http_method = "POST"
    uri = "https://dataflow.googleapis.com/v1b3/projects/${var.project_id}/locations/${var.region}/templates:launch?gcsPath=gs://${var.bucket}/templates/dataflow-demo-template"
    ...
}

If you are not familiar with Terraform, you could just use the gcloud SDK to accomplish the same thing:

gcloud beta scheduler jobs create http job_name \
  --schedule='every day' \
  --uri="https://dataflow.googleapis.com/v1b3/projects/$PROJECT_ID/locations/$REGION/templates:launch?gcsPath=gs://$BUCKET_NAME/templates/dataflow-demo-template" \
  --message-body-from-file="dataflow_message_body.json" \
  --oauth-service-account-email=$DATAFLOW_SERVICE_ACCOUNT

The dataflow_message_body.json contains a json similar to:

{
  "jobName": "dataflow_job_name",
  "parameters": {
    "inputFilePattern": "gs://$BUCKET_NAME/dataflow/input.txt",
    "outputDirectory": "gs://$BUCKET_NAME/dataflow/output.txt.gz",
    "outputFailureFile": "gs://$BUCKET_NAME/dataflow/failure",
    "compression": "GZIP"
  },
  "gcsPath": "gs://dataflow-templates/latest/Bulk_Compress_GCS_Files"
}

And if you want to do this in the console, just go to your project and then create a new Cloud Scheduler, which has the same fields as described above.

If you want to know how Google-provided templates look like, you could take a look here. When you want to know how to create your own templates and the format they should have, take a look here. When starting a Dataflow job, you always refer to a location in a bucket, whether it is the Google-provided gs://dataflow-templates bucket or your own bucket.