Multi-threaded FFTW 3.1.2 on a shared memory computer

I am using FFTW 3.1.2 with Fortran to do real complex and complex real FFTs. It works fine on one thread.

Unfortunately I have some problems when I use multi-threaded FFTW on a computer with 32 CPU shared memory. I have two plans, one for 9 real complex FFTs and one for 9 complex to real FFTs (size of each real field: 512 * 512). I am using Fortran and compile (using ifort

) my code referencing the following libraries:

-lfftw3f_threads -lfftw3f -lm -lguide -lpthread -mp

      

The program seems to compile correctly, and the function sfftw_init_threads

returns a nonzero integer value, usually 65527.

However, while the program works fine, it is slower with 2 or more threads than with one. The command top

shows strange CPU utilization over 100% (and much more than n_threads * 100). htop

the command shows that one processor (say, number 1) is working at 100% of the load on the program, while ALL other processors, including number 1, are working on this very program, at 0% load, 0% memory and 0 TIME.

If anyone knows what's going on here ... thanks a lot!

+2


source to share


2 answers


It looks like it might be a sync issue. You can get this type of behavior if all but one threads are blocked, for example. semaphore to the library call.



What do you call the scheduler? Are all your functions synchronized correctly? Are you creating plans in one thread or on all threads? I assume you read the thread safety notes in the FFTW docs ...;)

+2


source


If your FFTs are fairly large, automatic multithreading in FFTW is unlikely to be a winning speed. Synchronization overhead within a library can dominate computations. You have to profile different sizes and see where the break-even point is.



+1


source







All Articles