How is work divided between stormtroopers?
How does Apache Storm split tasks among its workers, I read that a storm does it on its own and that is a parallelism feature, but I don't know how to determine which node what and how many nodes will perform this task, mostly so I can calculate the optimal number of required nodes? Assuming the hardware configuration of all nodes is not the same.
By default, Storm uses round robin scheduling, which means it iterates over all supervisors with available slots and assigns parallel pin / bolt instances. If no more free slots are available, multiple spout / bolt instances are assigned to single workers.
You need to take a look at the storm interface. The metrics: total latency, capacity, execution latency, process latency, and failed tuples will give you clues about how many workers and tasks you should allocate for each bolt.