Multithreaded python GIL example

I've read quite a bit about how "bad" this Python GIL business is when writing multi-threaded code, but I've never seen an example. Can someone please give me a basic example of when the GIL is causing problems when using streaming.

Thank!

+3


source to share


1 answer


One of the main reasons for multithreading is that a program can take advantage of multiple CPUs (and / or multiple cores per CPU) to compute more operations per second. But in Python, the GIL means that even if you have multiple threads that are interrupted at the same time during computation, only one of those threads will be running at any given time, because all the others will be blocked waiting for the global interpreter to acquire the lock. This means that a multi-threaded Python program will actually be slower than a single-threaded version, not faster, since only one thread is started at a time - plus there is an accounting overhead that comes with forcing each thread to wait, acquire and then give up. GIL (circular style) every few milliseconds.

To demonstrate this, here is a Python script toy that spawns a specified number of threads and then, as a "computation", each thread simply increments the counter continuously until 5 seconds have elapsed. At the end, the main thread sums up the total number of counter increments that have occurred and prints out the total to give us an estimate of how much "work" was done during the 5 second period.

import threading
import sys
import time

numSecondsToRun = 5

class CounterThread(threading.Thread):
   def __init__(self):
      threading.Thread.__init__(self)
      self._counter = 0
      self._endTime = time.time() + numSecondsToRun

   def run(self):
      # Simulate a computation on the CPU
      while(time.time() < self._endTime):
         self._counter += 1

if __name__ == "__main__":
   if len(sys.argv) < 2:
      print "Usage:  python counter 5"
      sys.exit(5)

   numThreads = int(sys.argv[1])
   print "Spawning %i counting threads for %i seconds..." % (numThreads, numSecondsToRun)

   threads = []
   for i in range(0,numThreads):
      t = CounterThread()
      t.start()
      threads.append(t)

   totalCounted = 0
   for t in threads:
      t.join()
      totalCounted += t._counter
   print "Total amount counted was %i" % totalCounted

      

.... and here are the results I get on my computer (which is a dual-core Mac Mini with hyper-threading enabled, FWIW):



$ python counter.py 1
Spawning 1 counting threads for 5 seconds...
Total amount counted was 14210740

$ python counter.py 2
Spawning 2 counting threads for 5 seconds...
Total amount counted was 10398956

$ python counter.py 3
Spawning 3 counting threads for 5 seconds...
Total amount counted was 10588091

$ python counter.py 4
Spawning 4 counting threads for 5 seconds...
Total amount counted was 11091197

$ python counter.py 5
Spawning 5 counting threads for 5 seconds...
Total amount counted was 11130036

$ python counter.py 6
Spawning 6 counting threads for 5 seconds...
Total amount counted was 10771654

$ python counter.py 7
Spawning 7 counting threads for 5 seconds...
Total amount counted was 10464226

      

Notice how the best performance was achieved with the first iteration (when only one worker thread was created); counting performance dropped significantly when more than one thread was running at once. This shows how multithreaded performance in Python is crippled by the GIL - the same program written in C (or any other language without the GIL) will perform much better with an increase in the number of threads, no worse (as long as the number of worker threads is not will match the number of cores on the hardware, of course).

This does not mean that multithreading is completely useless in Python, although it is still useful in cases where most or all of your threads are blocked waiting for I / O rather than CPU related. This is because a Python thread that is blocked waiting for I / O does not hold a GIL lock while waiting, so other threads may still be running during that time. If you need to parallelize a task that requires computation though (like ray tracing or computation of all Pi digits or code or similar), then you will want to either use multiple processes or multiple threads, or use another language that does not have a GIL.

+6


source







All Articles