Storm: when to use setNumTasks?

712 Views Asked by At

I'm curious about the circumstances that would necessitate the use of the setNumTasks function. The docs say that the default is one task for each executor.

If I have an 'expensive' db task(calls to external dbs that take time) to run in a bolt with 'fast' tasks on either side would it behoove me to add extra Tasks for this?

Or is this one of those 'try it and see what happens' sort of scenarios?

2

There are 2 best solutions below

1
On BEST ANSWER
  • the number of tasks is always >= number of executors
    • the number of executors can be changed (without killing the topology), but the constraint num tasks >= num executors must be respected. This is, if you have more tasks than executors you can re-balance your topology and give it more executors.

how to decide how many executors/tasks do you need?

  • look for bottle necks, the one you pointed is a good one, the latency to access an external data source (look at the bolt process latency on storm UI). In this case you can (probably should) have more execution units on this bolt; And if you have "spare" tasks you can promote them to executors.
  • Another bottle necks is the CPU usage (look at the bolt capacity on storm UI), bolts which are more CPU intensive will require more execution units.

I recommend you read this page

0
On

I just verified this and found why there's this confusion about tasks.

In this case:

int BoltParallelism = 3;
int BoltTaskParallelism = 2;
builder.setBolt("bolt1", new BoltA(), BoltParallelism)
                .setNumTasks(BoltTaskParallelism)

BoltParallelism is indeed the number of executors and BoltTaskParallelism is indeed the number of tasks.

BUT

int BoltParallelism = 3;
builder.setBolt("bolt1", new BoltA(), BoltParallelism)

When you don't specify setNumTasks, Storm creates BoltParallelism number of tasks and creates BoltParallelism number of executors as well.

If you create 3 tasks, then Storm creates 3 instances of Bolt A. If your expensive DB read is happening in one instance of BoltA, then it's quite likely that other instances of BoltA would also be doing the same thing, because it's the same class. However, if you write your logic in such a way that BoltA class might under some conditions do a DB read and in other conditions do some other processing, then yes; it is worth having more tasks, and it is worth having every task run in a different executor(thread) because if you have 3 tasks and only one executor, then the tasks will be run one by one by the executor.