Should I disable HyperThreading for parallel modeling?
My computer has an i7 quad core processor. I am learning about parallelizing scientific simulations. How does hyper-threading affect concurrent activities? I know I should never use more than 4 workflows to get the trigger characteristics. But should hyper-threads be disabled? Does this have an impact on parallel performance?
In my experience, when managing EM and Inversion codes, the answer is yes, you should disable hyper-threading. But this is not a question that other people's anecdotes (even mine, fascinating and true as they are) answer well.
You are a student, this is definitely a topic that is worth your time in your own conclusions. There are so many factors that my experience with my codes on my platforms is almost useless to you.
On Linux, if you have 4 busy threads on the i7, they will host each on a different kernel. If the other half of the kernel is idle, the performance should be the same. If you are running another program, the debate is whether it is better to use hyperthreads to launch additional programs or context switches. (I suspect it's best not to switch context)
It is a common mistake that if you use 8 threads instead of 4, it will be twice as fast. It can only be slightly faster (in which case it might still be worth it) or slightly slower (in this case, limit your program to 4 threads). I found examples of where using double the number of threads was slightly faster. IMHO, this whole thing checks it out to find the optimal number and use it.
The only time I see that you need to disable HT is when you have no control over the behavior of the application and are using 4 threads faster.
HyperTreading is an Intel Simultaneous Multi Threading (SMT) implementation. In general, SMT is almost always beneficial (which is why it is usually enabled) if your application is not CPU bound. If you know for sure that your application is CPU bound, disable SMT. Otherwise (your application is IO bound or cannot fully saturate the kernels), leave it enabled.
You declare:
I know I should never use more than 4 workflows to get the trigger characteristics.
It doesn't have to be! Here is an example of what I found on the i7-3820 with HT support. All my code I ran was C ++. Please note that I have 8 separate programs (albeit the same) that I need to run. I have tried the following two ways to run these codes:
- Only execute 4 separate threads at a time, at the same time. When these 4 are complete, start the next 4 threads (4 x 2 = 8 total).
- Run all 8 as separate threads at the same time (8 x 1 = 8 total).
As you can see, these two scenarios achieve the same. However, I found that the runtime is:
- 1 hour for each set of 4 streams; for just 2 hours to complete all 8.
- 1.5 hours for a set of 8 streams.
What you will find is that one thread completes faster for case # 1, but general # 2 gives you better performance since ALL of your work completes in less time. I found that typical performance gains are ~ 25% with HT mode enabled.
As you can see, there are scenarios when starting 8 threads faster than 4.