It seems there is no conception of executors in JStorm and the method setTasksNumber()
seems useless because the number of tasks is related to parallelism_hint
.
My question: are tasks in JStorm static? If not, when a task is dead, will it restart? And if task not static, how does fields-grouping
work?
In JStorm, a worker behaves like an executor in Storm. A worker can have multiple tasks, but unlike Storm, the tasks within a worker may belong to different components, let's have an example:
A topology contains a spout (S), 2 bolts (B1, B2), the task number of each component is set when calling
TopologyBuilder.buildTopology
method, specifically inTopologyBuilder.setBolt
method.So let's say you set your S's parallelism to 2, and B1's parallelism to 3, B2's to 4. We'll have 2+3+4 = 9 tasks in total.
Then you may set total worker num to 3 by calling
Config.setNumWorkers()
method.After scheduling the workers & tasks, we have task id's and components like this:
B1: taskId: 1,2,3 S: taskId: 4,5 B2: taskId: 6,7,8,9
Note that task id's within the same component are consecutive, but it don't necessarily start from spouts to bolts.
Then we have the following scheduled workers and tasks:
Worker1: 1 4 6 Worker2: 2 5 7 Worker3: 3 8 9
As we can see, each worker has 3 tasks, the task may be of different components.Note that JStorm's scheduling algorithm is a little alike with Storm's default scheduling algorithm (but more powerful), please refer to this comparison: https://issues.apache.org/jira/browse/STORM-1320
During the running period of your topology, if you don't perform a rebalance operation, the scheduled results will always be the same, i.e., no matter which host + port (worker) is assigned, the tasks within this worker is always the same. Even by restarting the topology, if you don't change the parallelism of your components, the scheduled results will be the same. But if you perform a rebalance operation, tasks may change.
When some task in a worker dies (by throwing an unchecked/unhandled exception), the whole worker will be killed and the error will be reported to ZK. The worker is rescheduled immediately, note that
reschedule
may not be accurately appropriate here, nimbus knows this worker is dead, it will just try to restart the worker elsewhere, but the tasks within this worker are exactly the same.For more JStorm docs, please refer to: https://github.com/alibaba/jstorm