Optimize network multiprocessor code

Question

Optimize network multiprocessor code

I have a function that I am calling with multiprocessing .Pool

Like this:

from multiprocessing import Pool

def ingest_item(id):
    # goes and does alot of network calls
    # adds a bunch to a remote db
    return None

if __name__ == '__main__':
    p = Pool(12)
    thing_ids = range(1000000)
    p.map(ingest_item, thing_ids)

There are over 1 million items in the pool.map list, for each call ingest_item()

it will dispatch and call third party services and add data to the remote Postgresql database.

On a 12-core computer, processes are processed ~ 1000 pool.map

in 24 hours. Low CPU and RAM consumption.

How can I make it faster?

Could switching to Threads make sense since the bottleneck appears to be network calls?

Thanks in advance!

+3

python

Pythonsnake99 30 jul. 15 at 18:52

source to share

2 answers

Neal ehardt · Answer 1 · 2015-07-30T19:08:39+0000

First: remember that you are doing a network task. You should expect your CPU and RAM to be low because the network is orders of magnitude slower than your 12-core machine.

However, it is wasteful to have one process per request. If you start having problems starting too many processes, you can try pycurl as suggested here Library or tool to download multiple files in parallel

This pycurl example looks very similar to your task https://github.com/pycurl/pycurl/blob/master/examples/retriever-multi.py

NitrogenReaction · Answer 2 · 2015-07-30T19:10:21+0000

Using streams is unlikely to significantly improve performance. This is because no matter how much you split the task, all requests must go through the network.

To improve performance, you can see if third party services have some kind of high volume request API.

If your workload allows it, you can try to use some kind of caching. However, from your explanation of the task, it looks like it will be ineffective since you are basically sending data without asking for it. You can also consider caching open connections (if you don't already), this will help avoid very slow TCP handshakes. This type of caching is often used in web browsers (like Chrome ).

Disclaimer: I have no Python experience

Optimize network multiprocessor code

More articles: