Python: Why is multiprocessing locking shared between processes here?

I am trying to share the lock between processes. I understand that the way to share the lock is to pass it as an argument to the target function. However, I found that even the approach below works. I couldn't figure out how the processes share this lock. Can anyone explain this?

import multiprocessing as mp
import time


class SampleClass:

    def __init__(self):
        self.lock = mp.Lock()
        self.jobs = []
        self.total_jobs = 10

    def test_run(self):
        for i in range(self.total_jobs):
            p = mp.Process(target=self.run_job, args=(i,))
            p.start()
            self.jobs.append(p)

        for p in self.jobs:
            p.join()

    def run_job(self, i):
        with self.lock:
            print('Sleeping in process {}'.format(i))
            time.sleep(5)


if __name__ == '__main__':
    t = SampleClass()
    t.test_run()

      

+3


source to share


2 answers


On Windows (which you said you are using) these things always boil down to details about how to multiprocessing

play with pickle

, because all Python interface boundaries that cross Windows boundaries are implemented by pickling on dispatch end (and scattering on the receiving side).

My best advice is to avoid doing anything that raises questions like this, starting with :-) For example, the code you showed explodes on Windows under Python 2, and also explodes under Python 3 if you use multiprocessing.Pool

instead multiprocessing.Process

.

It's not just blocking, just trying to sort a related method (for example self.run_job

) explodes in Python 2. Think about it. You are crossing a process boundary and there is no object on the receiving end that matches self

. self.run_job

Which object belongs to that should be linked on the receiving end?

In Python 3, etching self.run_job

also soaks a copy of an object self

. So the answer is: a SampleClass

matching object self

is created by magic on the receiving end. Clear as dirt. t

complete condition pickles including t.lock

. This is why it "works".



See this for more implementation details:

Why can I pass an instance method for multiprocessing. Process but not multiprocessing .Pool?

You will end up with the least secrets if you stick with the things that obviously should have worked: pass globally callable module objects (neither instance methods nor local functions, for example), and explicitly pass multiprocessing

data objects (be it an instance Lock

, Queue

, manager.list

etc., etc.).

+3


source


On Unix operating systems, new processes are created using a primitive fork

.

The primitive fork

works by cloning the memory address space of the parent process, assigning it to the child. The child will have a copy of the parent's memory, as well as file descriptors and shared objects.

This means that when you call fork, if the parent file is open, it will have a child as well. The same applies with shared objects like pipes, sockets, etc.

On Unix + CPython Locks

are implemented with a primitive sem_open

that is intended to be shared when formatting a process.

I usually recommend not to mix concurrency (in particular multiprocessing) and OOP because it often leads to such misunderstandings.



EDIT:

Saw only now when you are using Windows. Tim Peters gave the correct answer. For the sake of abstraction, Python tries to provide OS independent behavior in the API. When you call an instance method, it will sort the object and send it over the pipe. This provides the same behavior as for Unix.

I would recommend that you read the programming guidelines for multiprocessing. Your problem is addressed specifically in the first point:

Avoid general condition

As much as possible, you should try to avoid transferring large amounts of data between processes.

It is probably best to use queues or pipes for inter-process communication rather than using lower-level synchronization primitives.

+2


source







All Articles