Throughput of TPL dataflow pipeline

1k Views Asked by At

We have a TPL dataflow pipeline with following blocks:

  • Transform Block A: Http post call
  • Transform Block B: Database IO
  • Transform Block C: Some unit conversion data (basically CPU intensive task)
  • Transform Block D: Publish to Google PubSub
  • Action Block E: Http post call

We are trying to run this pipeline with maximum throughput (100% CPU utilization). Things we have done:

• Set MaxDegreeOfParallelism to 1000 to each block

• Used Semaphore to limit maximum number of pipelines (which is 500 now)

• Messages to first block in the pipeline are delivered by Google PubSub subscription (with Flow Control Setting = 100 as maxOutstandingElementCount)

Our results:

 13000 messages are processed in 2.5 hours (i.e., ~87 messages in 1 minute)

 100% CPU utilization

 450 thread counts

Now the question, can this performance be improved? We have requirement of 50,000 messages in 10 minutes (considering no data is fetched from the database in Block B). Or suggest the places where we should try to optimize our code.

Machine Used:

o Procossor: Intel ® Xeon(R) CPU E3-1505M v5 @2.80GHz

o RAM: 32 GB

o System type: 64 bit OS

1

There are 1 best solutions below

0
On

You should optimize for max throughput regardless of CPU load, especially because you have mostly IO related work (http calls, file access, etc) where the CPU isn't used much anyway.

Parallelism should be a factor of the CPU cores you have available. The server cannot physically process more things at the same time than the number of cores so using a very large number of threads won't help. It's just creating more overhead for the same computing bandwidth and will actually slow everything down.

Parallelism for CPU-based work should be limited to the number of cores so that each core is working at full capacity but not overloaded. IO-based work is async and waiting for the IO to complete so it can use greater parallelism to dispatch more tasks at the same time.

A good start would be (# of cores) x 4 for the IO blocks and (# of cores) for the CPU-intensive blocks. That means if you have 4 CPU cores available, use 4 for CPU blocks, and 16 for the IO blocks. Try that and scale up the parallelism on the IO blocks until you hit max throughput.

Also note that various settings in the Dataflow pipeline, like ensuring strict ordering, can be tweaked for more performance depending on your requirements. TPL Dataflow can easily process 100k+ items/second if properly configured.