Backlog in Google Cloud Pub/Sub

1k Views Asked by At

I am new to GCP and while reading the documentation about Auto-tuning by Dataflow service they are talking about backlog and auto-scaling that depends on it. In this particular case what is backlog? If my pipeline is reading from a pub/sub, is it the age of oldest message or the number of unacknowledged messages?

1

There are 1 best solutions below

2
On BEST ANSWER

Backlogs in Dataflow aren't related to PubSub. Dataflow always get a message from PubSub when it is here. But the processing queue can increase internally in Dataflow: that is the backlogs. If it's too big, and the CPU consumption too high a new worker is added to the pipeline.

In streaming mode, you still have backlog, but you also have a predictive backlog. In fact, it compare the number of message in each time windows and if the number of message increase that can be the beginning of a spike and dataflow can scale up proactively.