The optimal number of processes?

What is the optimal number of processes per core? Let's say you are given a machine with two processors and four cores, what is the number of processes that will allow you to get the best performance?

Thank you for your help.

+3


source to share


1 answer


The answer is naturally - it depends. Obviously, if you are interested in the performance of a particular single threaded application, other processes will simply clutter up your machine and compete for shared resources. Therefore, we will consider two cases when this question may be interesting:

  • You have multiple processes running (let's say they are identical) and you are interested in aggregated performance.
  • You are running a multi-threaded application that can generate as many threads as possible.

The second case is easier to answer, it (.. wait for it ..) depends on what you are using! If you have locks, more threads can lead to more serious conflicts and conflicts. If you're free from blocking (or even some hassle without waiting), you may have honesty issues. It also depends on how the work is balanced within your application or how the task schedulers work. There are too many possible solutions today.

If we assume that you have a perfect balance between your threads and no overhead to increase the number, you can align this with a different use case where you just start multiple independent processes. In this case, performance can have a few sweet spots. First, when you reach the number of physical cores (in your case 8 if you have 4 physical cores per socket). At this point, you saturate your existing HW to the maximum. However, if you have some SMT engine (like Hyperthreading) you can increase the total cores by 2x using 2 logical cores for each physical core. This does not add any resource to the story, it just shares existing ones, which may have some penalty for executing each process, but on the other hand,can run 2x processes at the same time.



Total aggregate acceleration can vary, but I've seen an average of 30% on average across the overall criteria. As a thumb, processes that are memory lagged or have a complex flow of control can benefit from this, as the kernel can continue to evolve when one thread is blocked. Code that is more focused on execution bandwidth (like high floating point calculations) or memory bandwidth won't get the same amount.

Aside from this number of processes, it can be useful in some cases to add more processes - they won't run in parallel, but if the overhead for context switches is not too high and you want to minimize the average wait (this is also a way to look at performance that is not clean IPC), or you depend on getting the output as early as possible - there are scenarios where this is useful.

One final point - the "optimal" number of processes may even be less than the number of cores if your processes saturate other resources before reaching that point. If, for example, each thread requires huge virtual memory, you can start paginating and printing (painful punishment). If each thread has a large dataset that is used many times, you can fill your shared cache and start playing from that point by adding more threads. The same goes for heavy IOs, etc.

As you can see, there is no right or wrong answer here, you just need to test your code on different systems.

+3


source







All Articles