How do I use Python multiprocessing classes?

Here is some sample code that reads a file and adds each line. All digits are assumed to be between 0 and 20. However, I always get the result 0

.

I see that the middleware is doing well, so why the end result 0

?

Is there a better way to do this? I am trying to do more computation on a larger, more complex source file and keep statistics as I go.

import multiprocessing
import StringIO

class Total():
    def __init__(self):
        self.total = 0

    def add(self, number):
        self.total += int(number)

    def __str__(self):
        return str(self.total)

total = Total()

def f(input):
    total.add(input)

# Create mock file
mock_file = StringIO.StringIO()
for i in range(20):
    mock_file.write("{}\n".format(i))
mock_file.seek(0)

# Compute
pool = multiprocessing.Pool(processes=4)
pool.map(f, mock_file)

print total

# Cleanup
mock_file.close()

      

+1


source to share


2 answers


You can accomplish this with shared memory with subprocess.Value

, just change the class Total

to this:



class Total():
    def __init__(self):
        self.total = multiprocessing.Value('d', 0)

    def add(self, number):
        self.total.value += int(number)

    def __str__(self):
        return str(self.total.value)

      

+3


source


Each subprocess call f

updates its own copy total

and therefore the main process is total

not affected.

You can force each subprocess to return the result of its computation (in your mock example, which is only being injected unchanged), and then accumulate it in the main process. For example:.



def f(input):
  return input

results = pool.map(f, mock_file)
for res in results:
  total.add(res)

      

+2


source







All Articles