Trying to understand multiprocessing with core in python
Using the code below, I am getting weird output:
import sys
from multiprocessing import Process
import time
from time import strftime
now =time.time()
print time.strftime("%Y%m%d %H:%M:%S", time.localtime(now))
fr= [1,2,3]
for row in fr:
print 3
print 1
def worker():
print 'worker line'
time.sleep(1)
sys.exit(1)
def main():
print 'start worker'
Process(target=worker, args=()).start()
print 'main line'
if __name__ == "__main__":
start_time = time.time()
main()
end_time = time.time()
duration = end_time - start_time
print "Duration: %s" % duration
Output:
20120324 20:35:53
3
3
3
1
start worker
main line
Duration: 0.0
20120324 20:35:53
3
3
3
1
worker line
I thought I would get this:
20120324 20:35:53
3
3
3
1
start worker
worker line
main line
Duration: 1.0
Why does this launch run twice? Using python 2.7 on WinX64:
20120324 20:35:53
3
3
3
1
worker line
source to share
the problem is basically what is multiprocessing
really meant to work on a posix system, one with syscall. on these operating systems, the process can split in two, the child magically clones the state from the parent and both resume work in the same place, and now the child has a new process ID. In this situation, can arrange some mechanism to send state from parent to child as needed, with the certainty that the child will already have more python state needed. fork(2)
multiprocessing
Windows doesn't fork()
.
And so I multiprocessing
must take the slack. This basically involves starting a completely new python interpreter using a multi-processor child script. Almost immediately, the parent will ask the child to use something that is in the parent state, and so the child will have to recreate that state from scratch by importing your script into the child.
So, whatever happens during import in your script will happen twice, once in the parent and again in the child, as it recreates the python environment it needs to serve the parent.
source to share
This is what I get when I run my code on Linux using Python 2.7.3:
20120324 23:05:49
3
3
3
1
start worker
main line
Duration: 0.0045280456543
worker line
I don't know why you are running twice, but I can tell you why it doesn't return the expected duration time or print in the "correct" order.
When you start a process with multiprocessing
, the startup is asynchronous. That is, the function .start()
immediately returns to the parent process, so that the parent process can continue to run and do other things (for example, start more processes) while the child process does its own work in the background. If you want to block the process of the parent process until the process of the child process exits, you must use the function .join()
. For example:
def main():
print 'start worker'
p = Process(target=worker, args=())
p.start()
p.join()
print 'main line'
source to share