Setting the ideal thread pool size

What's the difference between -

newSingleThreadExecutor vs newFixedThreadPool(20)

in terms of operating system and programming.

Whenever I run my program with newSingleThreadExecutor

, my program works very well and the end latency (95th percentile) goes around 5ms

.

But as soon as I run my program using -

newFixedThreadPool(20)

my program performance is getting degraded and I am starting to see the final latency as 37ms

.

So now I'm trying to understand from an architectural point of view, what does thread count mean here? And how do I decide what is the optimal number of threads I should choose?

And if I use more threads then what happens?

If anyone can explain these simple things to me in layman's language then it will be very helpful to me. Thanks for the help.

My machine config - I am running my program from Linux machine -

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping        : 7
cpu MHz         : 2599.999
cache size      : 20480 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm arat pln pts
bogomips        : 5199.99
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping        : 7
cpu MHz         : 2599.999
cache size      : 20480 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm arat pln pts
bogomips        : 5199.99
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

      

+14


source to share


3 answers


Ok. Ideally, if your threads are non-blocking, so that they don't block each other (independently of each other), and you can assume that the workload (processing) is the same, then it turns out that the pool size Runtime.getRuntime().availableProcessors()

or availableProcessors() + 1

gives the best results.

But tell me if threads interfere with each other or have I / O, then Amadhal's law explains pretty well. From the wiki,

Amdahl's Law states that if P is the proportion of a program that can be made parallel (i.e. benefit from parallelization) and (1 - P) is a proportion that cannot be parallelized (remains sequential), then the maximum speedup which can be achieved with N processors is

Amadhal law



In your case, based on the number of available cores and what they are doing exactly (pure computation? Blocking I / O blocking?) Is locked for some resource? etc.), you need to come up with a solution based on the above parameters.

For example: a few months ago I was doing data mining from numeric websites. My machine was 4 cores and I had a pool size 4

. But since the operation was clean I/O

and my net speed was decent, I realized that I had the best performance with the pool size 7

. And that's because threads weren't fighting for computing power, but for I / O. So I could exploit the fact that more threads can challenge the kernel positively.

PS: I suggest by going to the Performance section from the book - Java Concurrency in Practice by Brian Goetz. He discusses such issues in detail.

+27


source


So now I'm trying to understand from an architectural point of view, what does thread count mean here?

Each thread has its own memory stack, program counter (for example, a pointer to which command is executing next), and other local resources. Switching between them delays latency for one task. The advantage is that while one thread is inactive (usually while waiting for I / O), another thread can get the job done. In addition, if multiple processors are available, they can run in parallel as long as there is no resource conflict and / or locks between tasks.

And how do I decide what is the optimal number of threads I should choose?



The trade-off between swap cost and avoiding downtime depends on the little details of what your task looks like (how much i / o and when, what work between i / o, using a lot of memory to complete). Experiment is always the key.

And if I use more threads then what happens?

There will usually be a linear increase in productivity at first, then a relatively flat part, then a drop (which can be quite steep). Each system is different.

+4


source


A look at Amdahl's Law is good, especially if you know exactly how big P and N are. Since this will never happen, you can monitor performance (which you should do anyway) and increase / decrease the size of the thread pool to optimize any performance metrics for you.

+1


source







All Articles