How many threads should I create?

Based on this question, I have a class where its constructor only does some assignments and then there is a member function build()

that actually does the job.

I know that the number of objects that I will need to build is in the range [2, 16]. The actual number is a custom parameter.

I create my objects in a for loop like this

for (int i = 0; i < n; ++i) {
  roots.push_back(RKD<DivisionSpace>(...));
}

      

and then in another loop I create threads. Each thread calls build()

on a chunk of objects based on this logic:

If your vector has n elements and you have p-threads, thread I only write elements

[in / p, (i + 1) n / p).

So, for example, the situation is as follows:

std::vector<RKD<Foo>> foos;
// here is a for loop that pushes back 'n' objects to foos

// thread A         // thread B                 // thread C
foos[0].build();    foos[n / 3 + 0].build();    foos[2 * n / 3 + 0].build();
foos[1].build();    foos[n / 3 + 1].build();    foos[2 * n / 3 + 1].build();
foos[2].build();    foos[n / 3 + 2].build();    foos[2 * n / 3 + 2].build();
...                 ...                         ...

      


The approach I used was to determine the number of threads p

as follows:

p = min(n, P) 

      

where n

is the number of objects I want to create and the p

return value of std :: thread :: hardware_concurrency . After solving with some of the problems that the C ++ 11 feature has, I read the following:

Even when hardware_concurrency is implemented, it cannot be considered a direct mapping to the number of cores. This is what the standard says it returns - the number of hardware thread contexts. And it goes into a state - this value should only be considered a hint. If you have hyper-threading enabled on your machine, it is quite possible that the return value will be 2x the number of cores. If you want a reliable answer, you will need to use whatever means your OS provides. - Praetorian

This means that I should probably change the approach, since this code is intended to be executed by multiple users (and I mean not only on my system, many people will run this code). So I would like to choose the number of threads in such a way that they are both standard and efficient. Since the number of objects is relatively small, is there some rule to follow or something else?

+1


source to share


1 answer


Just select a thread pool from threads hardware_concurrency

and queue up the items in order of arrival.

If other processes in the system somehow get priority from the OS, so be it. It simply means that less than the size of the distributed pool (for example P - 1

) can run concurrently. It doesn't matter, as the first available thread in the pool that runs on build()

one item will pick the next item from the queue.

To really avoid threads competing for a single core, you could



  • use a semaphore (interprocess semaphore if you want to actually coordinate builder threads from separate processes)

  • thread affinity (to prevent the OS from scheduling a specific thread to another core for the next chunk of time); Unfortunately, I don't think there is a standard, platform-independent way to establish thread attachments (yet).

I see no good reason to make it more difficult

+1


source







All Articles