Using sys.stdout.write with pool.map for sys.stdout multiprocessor exchange

This is probably something very basic that I am missing.

Why can't I use pool.map(sys.stdout.write, iterable)

?

I can use pool.map(len, iterable)

using the same iterative but when using sys.stdout.write

I get the following exception:

TypeError: expected string or Unicode object, NoneType found

      

This is the traceback:

Traceback (most recent call last):
  File "/home/reut/python/print_mult.py", line 19, in <module>
    pool.map(sys.stdout.write, messages)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
TypeError: expected string or Unicode object, NoneType found

      

Complete code:

#!/usr/bin/env python

import multiprocessing
import sys

# pool of 10 workers
pool = multiprocessing.Pool(10)
messages = ["message #%d\n" % i for i in range(100)]
print messages
pool.map(sys.stdout.write, messages) # doesn't work - error
# print pool.map(len, messages) # works

      

Edit # 1 - ThreadPool works:

When I use ThreadPool

(from multiprocessing.pool

) it works, so I guess it has something to do with not being able to share a stream sys.stdout

across processes.

Edit # 2 - manual processes work as well:

from multiprocessing import Process
import sys

# pool of 10 workers
processes = []
for i in range(10):
    processes.append(Process(target=sys.stdout.write, args=("I am process %d" % i, )))

for p in processes:
    p.start()

for p in processes:
    p.join()

      

So now I'm confused because the difference I know between a normal process and a card process is that it is forks. I'm not sure how appropriate this is. The only thing I can think of is that the map stores target

internally and cannot share it with workers the way a manual constructor does Process

.

+3


source to share


2 answers


The real mistake is hidden. You can only pass a function that directly refers to the module namespace. However, in some cases, there are ways to work around this limitation. Unix has a special feature whereby a process can be forked and all of its memory duplicated. This is how instance methods can be passed to the child process - nothing is actually passed. On the Windows platform, processes cannot be forked, but must be created instead. This means that a new interpreter has been launched. For the interpreter to run this function, it is sent the name of the function to run and the module in which it is located. The interpreter imports the module and looks for the function before finally running the function.

For a process that is part of a pool, the process is already running and therefore cannot use forking to get a copy of the corresponding function / method to start. Instead, it should use the same technique as when creating a new process. This is why you can get your second edit to work, but not the pool to work.

The easiest way to get around your problem is to make the print a function, not an operator.



from __future__ import print_function

import multiprocessing
import sys

if __name__ == '__main__':
    pool = multiprocessing.Pool(2)
    messages = ["message #%d\n" % i for i in range(5)]
    print(messages) # <- notice the brackets around the arguments to print
    pool.map(print, messages)

      

Otherwise, you can define a function that will print for you and use it as a function for the card.

import multiprocessing 
import sys

def stdout_write(arg):
    sys.stdout.write(arg)

def stdout_print(arg):
    print arg

if __name__ == '__main__':
    pool = multiprocessing.Pool(2)
    messages = ["message #%d\n" % i for i in range(5)]
    print messages
    pool.map(stdout_print, messages)

      

+3


source


I'm not sure why exactly, but pool.map()

requires the function to return a string.

This simple change to your program does the right thing.



#!/usr/bin/env python

import multiprocessing
import sys

def prn(s):
    sys.stdout.write(s)
    return ''

# pool of 10 workers
pool = multiprocessing.Pool(10)
messages = ["message #%d\n" % i for i in range(100)]
print messages
pool.map(prn, messages) # doesn't work - error
# print pool.map(len, messages) # works

      

I checked the documentation and I don't see this requirement, so I don't know why it is being executed.

+2


source







All Articles