Scientific computing :: OpenMP or Pthreads

I am developing codes for the scientific computing community, especially for solving a linear system of equations (Ax = b form) iteratively.

I've used BLAS and LAPACK for primitive matrix routines, but now I realize that there are several possibilities for manual parallelization. I'm working on a shared memory system that leaves me with two options: OpenMP and PThreads.

Assuming time is not the biggest factor (and code performance), what is the best, future proof, and possibly portable (for CUDA) way to parallelize? Should the time spent using Pthreads improve performance?

I believe that my application (which basically deals with starting several things at once and then working with the "best" value from all of them) will benefit from explicit flow control, but I'm afraid that coding will take too long, and there will be no payback in the end.

I have already covered several similar questions, but they are all related to general applications.

This refers to a generic multi-threaded application on Linux.

This question is also common.

I know SciComp.SE but felt there was more on the topic here.

+3


source to share


3 answers


Your question reads as if you expected the coding efficiency to be better with OpenMP than with Pthreads, and the execution efficiency with Pthreads than with OpenMP. In general, I think that you are right. However, some time ago, I decided that my time was more important than my computer time, and chose OpenMP. This is not a solution that I regretted, and it is not a solution that I can confirm.

However, you are wrong in assuming that your choices are limited to OpenMP and Pthreads, MPI (I assume you've at least heard of this, post again if not) will also work on shared memory computers. For some applications, MPI can be programmed to easily overcome OpenMP on shared memory computers.



Three (+/- a few) years ago, the main parallelization tools in the scientific toolbox were OpenMP and MPI. Anyone using these tools has been part of a large community of other users, larger (for some testimony only) than the Pthreads and MPI community. Today, with GPUs and other accelerators popping up all over the place, the situation is much fragmented and it is difficult to choose one of the winners from HMPP, ACC, Chapel, MPI-3, OpenMP4, CUDA, OpenCL, etc. I still think that OpenMP + MPI is a useful combination, but cannot ignore new children on the block.

FWIW I am working on developing computational EM codes for geophysical applications, so this is a pretty tough "science computer".

+7


source


I realize that my answer is quite long, so I draw the conclusion for impatience first:

The short answer is:

I would say that openMP and pthreads are essentially the same and you should choose what takes the least developer time (perhaps openMP if that suits your needs). But if you want to invest in development time, you may need to reconfigure your code so that it can adapt to other paradigms (like vectorization to use SSE / AVX or GPUs).

Development:



If you are developing linear solvers, I am assuming that your code will be (very) durable (i.e. it will probably outlive the physical models that will use it). In such circumstances, especially if you do not have a large development team, I think that you should base your choice primarily on development time, maintainability and

Also, you shouldn't assume that the “best” choice today (whatever “better” may mean) will probably not be the “best” tomorrow yet. So even if you are facing the openMP vs pthreads problem (and even now the spectrum is already larger than the one given in @HighPerformanceMark's answer), you should expect to have more alternatives to choose from in the future.

If you have development time to spend now, I would say it would be better to invest if you could abstract all the computationally intensive cores in your code so that you can easily adapt them to different parallelization paradigms. In this regard, the most important (and difficult) challenge is data structure: the benefits of sharing for GPGPU calculations requires putting your data in a different order than the traditional way of cache optimization.

This leads me to the conclusion that all thread-safe solutions are essentially equivalent (both in terms of performance and code architecture), and you should choose any solution that requires the least development time. But if you want to invest in development time, perhaps you should redesign your code so that it can be either parallelized or vectorized (and thus use SSE / AVX or GPUs). If you can do this, you can keep track of hardware and software evolution and maintain performance.

+1


source


To add to the already great answers: OpenMP is usually better at parallelizing my code than writing pthreads. Given that OpenMP is simpler as well, I always pick it if that's my options. I suspect that if you are asking this question, you are not a pthread guru, so I recommend using OpenMP over pthreads as well.

0


source







All Articles