How do I use Python multiprocessing classes?
Here is some sample code that reads a file and adds each line. All digits are assumed to be between 0 and 20. However, I always get the result 0
.
I see that the middleware is doing well, so why the end result 0
?
Is there a better way to do this? I am trying to do more computation on a larger, more complex source file and keep statistics as I go.
import multiprocessing
import StringIO
class Total():
def __init__(self):
self.total = 0
def add(self, number):
self.total += int(number)
def __str__(self):
return str(self.total)
total = Total()
def f(input):
total.add(input)
# Create mock file
mock_file = StringIO.StringIO()
for i in range(20):
mock_file.write("{}\n".format(i))
mock_file.seek(0)
# Compute
pool = multiprocessing.Pool(processes=4)
pool.map(f, mock_file)
print total
# Cleanup
mock_file.close()
source to share
You can accomplish this with shared memory with subprocess.Value
, just change the class Total
to this:
class Total():
def __init__(self):
self.total = multiprocessing.Value('d', 0)
def add(self, number):
self.total.value += int(number)
def __str__(self):
return str(self.total.value)
source to share
Each subprocess call f
updates its own copy total
and therefore the main process is total
not affected.
You can force each subprocess to return the result of its computation (in your mock example, which is only being injected unchanged), and then accumulate it in the main process. For example:.
def f(input):
return input
results = pool.map(f, mock_file)
for res in results:
total.add(res)
source to share