Understanding this multi-threaded python daemon code

So I am new to python and am working on a file system event handler. I came across the watchdog api and there I saw multi-threaded code that I can't figure out.

Here is the code that is posted on their website:

import sys
import time
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO,
                        format='%(asctime)s - %(message)s',
                        datefmt='%Y-%m-%d %H:%M:%S')
    path = sys.argv[1] if len(sys.argv) > 1 else '.'
    event_handler = LoggingEventHandler()
    observer = Observer()
    observer.schedule(event_handler, path, recursive=True)
    observer.start()
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

      

This code starts an endless loop and listens to some folder and logs what it sees to the console. My doubt is the path towards the bottom of the code.

So you start as an observer. Then ask it to go into an endless loop until some click is made. My guess is that somewhere in the observer.start () code they also set daemon = True. With some pressing, the program ends and stops the observer. In the watchdog api, the stop () definition says it stops the daemon thread.

1) Then it executes join (). But what is the need for this unification. I have already stopped the demon flow. Not join () means wait for all threads to stop and then and only then exit the program. Can I remove join () from the code. After removing it, my program still works correctly.

2) I also don't understand the need for sleep (1) inside a while loop. What happens if I just put the "pass" statement there. I assume the while loop will consume more resources ??? And the reason we set the sleep time as 1 second rather than 2-3 seconds is because in the worst case, it may take the user 2-3 seconds to close the program. But I could be wrong.

+3


source to share


1 answer


  • Remember that the daemon is running in the parent process, process. You need to keep the parent process alive while this thread is running, otherwise it will be killed as the program exits (and probably not in a graceful way). This join

    ensures that the process stays alive until all the threads have actually exited; just because you called stop

    does not guarantee that the thread has actually finished executing. stop

    is a request to stop a thread, it does not require blocking until the thread finishes (and it should not be so that the parent thread can call stop

    for many child threads at the same time).

  • This is purely for lower CPU consumption. If you just had pass

    in
    there, the CPU will run that and loop as fast as possible, with pauses. The call sleep

    voluntarily gives the CPU to other processes as it knows it doesn't need to react quickly to any specific conditions. And you're essentially right, it is sleep(1)

    , so your worst response time is roughly 1 second.

UPDATE:

Here's an example of why it's important to have join

. Let's say the following was running on the thread:

while not self.stop:  # self.stop is set to True when stop() is called
    ...
    self.results.append(item) # do some stuff that involves appending results to a list
with open('~/output.txt', 'w') as outfile:
    outfile.write('\n'.join(str(item) for item in item))

      



When called stop

, the while loop will end and the result file will open and recording will begin. If join

not invoked, the process may complete before the operation completes write

, resulting in corrupted results. join

ensures that the parent thread is waiting for this write to complete. It also ensures that the process is actually waiting for a whole iteration of that while loop; without, join

you can not only skip writing the file, but also end in the middle of this block while

.

If, however, the thread it was referencing stop

didn’t do anything long after completing while

, it join

effectively returned instantly and therefore basically turned into a NOP.

UPDATE 2:

With regard to a sleep call, certain events (for example ctrl+c

) can get out of even the call sleep

in the parent process. Therefore, in this particular case, the length of the sleep does not really matter. Setting it to 1 second is just a convention to make it clear that you are basically doing a "performance processor" and not just sleeping.

+2


source







All Articles