How to make a Dataflow job to use different worker pools for different stages of a workflow?

23 Views Asked by At

Is there a way to tie worker pool with a particular stage of a workflow? I have a Dataflow workflow which has one stage that can be ran in parallel only inside one worker (it updates a specific file database, which supports parallel updates, but have to be stored locally). So for this particular stage I would like to use a worker with bigger amount of CPUs (n1-standard-64) while for all others I use n1-starndard-4.

Dataflow supports worker pools, where I can describe how many instances must be instantiated and which machineType must be used. Moreover I can specify multiple worker pools and java doc says:

Note that a workflow job may use multiple pools, in order to match the various computational requirements of the various stages of the job.

And Dataflow REST client supports this and expects an array when you want to specify workerPools.

So the question is how to tie the stage with specific worker pool? Is there any way to do that?

0

There are 0 best solutions below