Python using multiprocessing

I am trying to use multiprocessing in python 3.6. I have for loop

one that launches method

with different arguments. It currently runs one at a time, which is quite time consuming, so I'm trying to use multiprocessing. Here's what I have:

def test(self):
    for key, value in dict.items():
        pool = Pool(processes=(cpu_count() - 1))
        pool.apply_async(self.thread_process, args=(key,value))
        pool.close()
        pool.join()


def thread_process(self, key, value):
    # self.__init__()
    print("For", key)

      

I think my code uses 3 processes to start one method

, but I would like to start 1 method for each process, but I don't know how. I am using 4 cores compared to others.

+3


source to share


3 answers


You create a pool at each iteration of the for loop. Populate the pool ahead of time, apply the processes you want to run in multiprocessing, and then attach them to them:



from multiprocessing import Pool, cpu_count
import time

def t():
    # Make a dummy dictionary
    d = {k: k**2 for k in range(10)}

    pool = Pool(processes=(cpu_count() - 1))

    for key, value in d.items():
        pool.apply_async(thread_process, args=(key, value))

    pool.close()
    pool.join()


def thread_process(key, value):
    time.sleep(0.1)  # Simulate a process taking some time to complete
    print("For", key, value)

if __name__ == '__main__':
    t()

      

+4


source


You are not filling your data with multiprocessing.Pool

data - you are reinitializing the pool every loop. In your case, you can use Pool.map()

to do all the hard work for you:

def thread_process(args):
    print(args)

def test():
    pool = Pool(processes=(cpu_count() - 1))
    pool.map(thread_process, your_dict.items())
    pool.close()

if __name__ == "__main__":  # important guard for cross-platform use
    test()

      



Also, given all these arguments self

, I believe that you are grabbing this from an instance of the class, and if so, don't do it unless you know what you are doing. Since multiprocessing in Python essentially works like, well, multiprocessing (as opposed to multithreading), you cannot share your memory, which means that your data is pickled when exchanged between processes, which means anything that cannot be pickled (like an instance methods) are not called. You can read more about this issue on this answer .

+1


source


I think my code uses 3 processes to run one method, but I would like to run 1 method for each process, but I don't know how. I am using 4 cores compared to others.

No, you are actually using the correct syntax here to use 3 cores to execute an arbitrary function independently of each other. You cannot magically use 3 cores to work together on the same task by explicitly introducing that part of the algorithm / coding itself is often used by the user himself, using threads (which do not work the same in python as they do outside the language).

However, you reinitialize the pool in every loop you need to do something like this in order to actually get it right:

    cpus_to_run_on = cpu_count() - 1
    pool = Pool(processes=(cpus_to_run_on)
    # don't call a dictionary a dict, you will not be able to use dict() any 
    # more after that point, that like calling a variable len or abs, you 
    # can't use those functions now
    pool.map(your_function, your_function_args)
    pool.close()

      

Take a look at the python multiprocessing

docs for more details if you want to better understand how this works.In python, you cannot use threads for multiprocessing with the default CPython interpreter. This is due to what is called a global interpreter lock, which stops concurrent access to resources from python itself. The GIL does not exist in other implementations of the language, and it is not something to deal with with other languages ​​like C and C ++ (and hence you can actually use threads in parallel to work together on a task, unlike CPython )

Python wrap around this problem by simply making multiple instances of the interpreter when using the multiprocessing module, and any message passing between the instances is done by copying data between processes (that is, the same memory is usually not touched by the instances of the interpreter). This, however, does not happen in the misnamed threading unit, which often actually slows down processes through a process called context switching. Threading has limited usefulness today, but provides an easier way to block non-GIL processes such as socket and file read / write than python async.

On top of all this though, there is a big problem with your multiprocessing. Your letter to standard output. You won't get the profit you want. I'm thinking about it. Each of your processes "prints" data, but they are all displayed on the same terminal / output screen. So even if your processes "print", they don't actually do it on their own, and the information must be merged back into the other processes where the text interface (ie your console) lies. So these processes write whatever they collect to some kind of buffer, which then has to be copied (as we learned from how multiprocessing works) to another process, which then takes that buffered data and outputs it.

Typically, bogus programs use printing as a way to show how there is no order between the execution of these processes, which they can complete at different times, they are not intended to demonstrate the performance benefits of multicore processing.

+1


source







All Articles