How do I determine if the Python Multiprocessing module is using all my cores for calculations?

I have simple code from a tutorial like this:

from multiprocessing import Process, Lock
import os

def f(i):
    print 'hello world', i
    print 'parent process:', os.getppid()
    print 'process id:', os.getpid(), "\n\n"

if __name__ == '__main__':
    lock = Lock()

    for num in range(10):
        p = Process(target=f, args=(num,))
        p.start()
    p.join()

      

How can I tell if this is using both of my kernels? I am currently running Ubuntu 11.04 w / 3 GB RAM and Intel Core 2 Duo 2.2GHz.

The project I am researching for this will be carried over to a huge machine in someone's office, with much more power than I currently have. Specifically, the processor will have at least 4 cores and I want to make sure my algorithm automatically detects and uses all available cores. Also, this system could potentially be something other than Linux, so are there some general considerations I should keep track of when moving the Multiprocessing module between OSs?

Oh yes, also the script output looks something like this:

hello world 0
parent process: 29362
process id: 29363 


hello world 1
parent process: 29362
process id: 29364 


hello world 2
parent process: 29362
process id: 29365 

and so on...

      

So, from what I know so far, the PPIDs are the same, as the script above is when run is the parent process that calls the child processes, each of which is a different process. So multiprocessing automatically detects and processes multiple cores, or should I say where to look? Also, from what I read while looking for a copy of this question, I shouldn't be spawning more processes than there are kernels because it is fueled by system resources that would otherwise be used for computation.

Thanks in advance for your help, my thesis loves you.

+3


source to share


3 answers


Here is a small command I use to monitor my kernels from the command line:

watch -d "mpstat -P ALL 1 1 | head -n 12"

      

Please note that the command mpstat

must be available on your system, which you can get on Ubuntu by installing the package sysstat

.



sudo apt-get install sysstat

      

If you want to determine the number of available kernels from Python, you can do so using the function multiprocessing.cpu_count()

. On Intel processors with Hyper-Threading, this number will be double the actual number of cores. Running as many processes as you have available kernels will tend to scale to take up all the kernels on your machine if the processes have enough work to do and not get bogged down in conversation. Linux Process Scheduler will take it from there.

+3


source


A bit about your sample code. You are currently not using your castle, even if you created one. And you are just joining the last process you started. Right now they are probably finishing so quickly that you won't see a problem, but if any of these earlier processes took longer than the last one, you can terminate before they do, I guess.

Regarding that each process ends on a different core. Sorry, you can't. This is the decision that the operating system scheduler will make. You just write code that uses multiple processes so that the system can schedule them in parallel. Some of them may reside in the same core.



Pitfalls (pratfalls?), It could be that your actual code doesn't really require multiple processes, and could benefit much better from streaming instead. Also, you have to be very careful with how you exchange memory in multiprocessing. There is a lot more overhead associated with inter-process communication and inter-thread pipes. Therefore, it is usually reserved for the case where streams simply won't get you what you want.

+1


source


If you are on a unix system, you can try running the "top" command and see how many of your processes are spawning at the same time. While this is somewhat empirical, many times by just looking at the process list, you will see multiples.

Although you are looking at the script, I cannot see where you are calling multiple processes. You can import multiprocessing.pool and then map your function to different processors.
http://docs.python.org/library/multiprocessing.html

0


source







All Articles