Storm: when to use setNumTasks?
I'm interested in the circumstances that would require the use of the setNumTasks function . The docs say the default is one task for each performer.
If I have an "expensive" db task (calls to external dbs that take a while) to run bolt with "fast" tasks on both sides, would I add additional tasks for that?
Or is this one of those who "try and see what happens" in the scripts?
source to share
- number of tasks always> = number of performers
- the number of performers can be changed (without destroying the topology), but the tasks of the constraint num → num must be performed. This means that if you have more tasks than performers, you can rebalance your topology and provide it with more performers.
how to decide how many performers / tasks you need?
- find the bottle necks, what you listed is good, latency to access external data source (look at the latency of the shutter process on the storm UI). In this case, you can (probably should) have more execution blocks on that bolt; And if you have "spare" tasks, you can promote them to performers.
- Another bottleneck is CPU usage (look at the bolt ability on the assault interface), bolts with more CPU intensity will require more execution units.
I recommend you read this page
source to share
I just tested this and found why this confusion about tasks arises.
In this case:
int BoltParallelism = 3;
int BoltTaskParallelism = 2;
builder.setBolt("bolt1", new BoltA(), BoltParallelism)
.setNumTasks(BoltTaskParallelism)
BoltParallelism
really the number of performers, but BoltTaskParallelism
- really the number of tasks.
BUT
int BoltParallelism = 3;
builder.setBolt("bolt1", new BoltA(), BoltParallelism)
If you don't specify setNumTasks
, Storm creates a BoltParallelism
number of jobs and creates a number BoltParallelism
for performers.
If you create 3 tasks, then Storm creates 3 instances of Bolt A. If your expensive DB read happens in a single instance of BoltA, then it is likely that other instances of BoltA will do the same as well, because it is the same class. However, if you write your logic in such a way that the BoltA class can read the database under some conditions and perform some other processing under other conditions, then yes; it is worth having more tasks, and it is worth having any task in another executor (thread), because if you have 3 tasks and only one executor, then the tasks will be performed one by one by the executor.
source to share