High memory usage with Python.Pool multiprocessing

I wrote a program that parses ~ 2600 text documents into Python objects. These objects have many references to other objects and generally describe the structure of the document.

Serializing these objects with pickle is no problem and super fast.

After parsing these docs into Python objects, I have to do some heavy computation on them that I would like to do in parallel. My current solution is passing one of these document objects to a worker function, which then does this heavy computation.

The results of these calculations are written to objects, which are attributes in the document object. The work function then returns those modified objects (only those attribute objects, not the original document object).

It all works with the following simplified code:

def worker(document_object):
    # Doing calculations on information of document_object and altering objects which are attributes of document_object
    return document_object.attribute_objects

def get_results(attribute_objects):
    # Save results into memory of main process

# Parsing documents
document_objects = parse_documents_into_python_objects()

# Dividing objects into smaller pieces (8 and smaller worked for me)
for chunk in chunker(document_objects, 8):
    pool = multiprocessing.Pool()
    pool.map_async(worker, chunk, callback=get_results)


However, there are several problems:

  • This works when I pass small chunks of all document_objects to map_async (). Otherwise, I get memory errors even with 15GB of RAM.
  • htop tells me that only 2-3 of all 8 cores are being used.
  • I have a feeling this is not much faster than the single process version (I could be wrong about that).

I understand that every document_object needs to be pickled and copied into the worker process and that map_async () keeps all of this data in memory until a pool happens. join ().

I don't understand why it takes so much memory (up to ~ 12GB). When I bundle one document_object into a file, the file turns out to be no more than 500KB.

  • Why is this using up so much memory?
  • Why are only 2-3 cores used?
  • Is there a better way to do this? For example, is there a way that I can store the results in the main process right after a separate worker function completes, so I don't have to wait for join () until all memory is available again and the results are available via a callback function?

Edit: I'm using Python 2.7.6 on Ubuntu 14.04 and Debian Wheezy.

Edit: When I print the start and end of the working function as given, suggested in the comments, I get something like the following, which doesn't look parallel at all. Also, there are ~ 5 seconds between each end and the beginning.

start <Process(PoolWorker-161, started daemon)>
end <Process(PoolWorker-161, started daemon)>
(~5 seconds delay)
start <Process(PoolWorker-162, started daemon)>
end <Process(PoolWorker-162, started daemon)>
(~5 seconds delay)
start <Process(PoolWorker-163, started daemon)>
end <Process(PoolWorker-163, started daemon)>
(~5 seconds delay)
start <Process(PoolWorker-164, started daemon)>
end <Process(PoolWorker-164, started daemon)>
(~5 seconds delay)
start <Process(PoolWorker-165, started daemon)>
end <Process(PoolWorker-165, started daemon)>



First of all, the problem was not found in the simplified version of my code that I posted here.

The problem was that I wanted to use an instance method as my work function. This does not work out of the box in Python, as instance methods cannot be pickled. However, there is a workaround by Stephen Betarad that might solve this problem (and which I used).

The problem with this workaround, however, is that it has to sort through the instance of the class that contains the working method. In my case, this instance has attributes that are links to huge data structures. So every time the instance method was pickled, all these huge data structures were copied around, leading to my problems as described above.


source to share

All Articles