How to keep Google pub-sub cron substitute python routine from timing out intermittently

119 Views Asked by At

I'm using this Google Cloud example git routine to simulate a cron routine on my cloud server: reliable-task-scheduling-compute-engine-sample

https://github.com/GoogleCloudPlatform/reliable-task-scheduling-compute-engine-sample/blob/master/readme.md

It runs continuously, and generally without interruption. However, from time to time, and without a cause I can identify, it throws this timeout error:

Traceback (most recent call last):
  File "test_executor.py", line 62, in <module>
    test_executor.watch_topic()
  File "/home/bitnami/cloud_app/task_scheduling/gce/cron_executor.py", line 234, in watch_topic
    msgs = self.get_messages()
  File "/home/bitnami/cloud_app/task_scheduling/gce/cron_executor.py", line 146, in get_messages
    body=body).execute()
  File "/usr/local/lib/python2.7/dist-packages/oauth2client/_helpers.py", line 133, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/http.py", line 840, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 504 when requesting https://pubsub.googleapis.com/v1beta2/projects/southern-ivy-106215/subscriptions/test_sample_task_task:pull?alt=json returned "The service was unable to fulfill your request. Please try again. [code=8a75]">

If the 504 error is due to the server not receiving a timely response, how do I modify the routine cron_executor.py to keep waiting instead of aborting?

As this routine functions as a cron, it's a headache to have to check once or twice a day just to see if it's still running.

Can someone please help solve the reliability problem with this routine?

1

There are 1 best solutions below

4
On

Google recommends retrying temporary errors like 504 with exponential back-off.

You could implement this yourself, but most of Google's client libraries already support it, so the easiest thing to do is use that. In the code you link to, you just have to add num_retries=5 (or any other number of retries) to each execute() call.

If you are making more extensive changes, you might want to consider the google.cloud.pubsub_v1 library. This is easier to use than the autogenerated libraries that the example code you link uses, and does retry by default.