Intel TBB disables nested parallelism

Consider the following scenario: I am writing a function that has a computationally intensive loop. I have parallelized it with TBB parallel_for

. Now the problem is that this function can be used on its own and also benefits from parallelization. Or it can be used in another loop. In the later case, the outer loop can also be parallelized. And it is often best to parallelize the outer loop.

Usually in TBB, both the outer and inner loop are parallelized, this is not a problem, because, unlike OpenMP, nested parallelization in TBB does not lead to the creation of additional threads. TBB just creates more challenges. However, once upon a time, the overhead of creating more tasks in the inner loop is still undesirable (I've seen 40% slowdown in one extreme situation).

So, is there a way to prevent TBB from creating any tasks when called parallel_for

etc. when executing another algorithm parallel_for

? Similar OMP_NESTED=FALSE

to OpenMP.

+3


source to share


1 answer


Simple answer: No

Simple tip: don't use simple_partitioner

It is impossible to influence parallel_for

or other algorithms from the outside or at the external level, except for limiting their concurrency to task_scheduler_init

or task_arena

. Though, they are not suitable for nested parallelism anyway.



Anyway, there shouldn't be as much performance impact if used auto_partitioner

(especially at a nested level) and you follow TBB's best practices for performance that is efficient for parallelization.

Though I admit it can be a problem in extreme cases. We (the TBB developers) have thought about optimizing the automatic splitting parameters parallel_for

depending on the context in which it is executed. But the problem is that knowing whether we are at a nested level or not is not enough to reliably determine the parameters. For example. consider when a is parallel_for

started from a single task: it is formally nested, but there is no parallelism externally. Some parts of the Task Scheduler need to be significantly redesigned to be able to provide information on the number of employees in operation at any given time in order to include this idea.

+2


source







All Articles