Why doesn't Python math.factorial play well with streams?

Why is math.factorial acting so strange in a flow?

Here's an example, it creates three streams:

  • a stream that just sleeps for a while
  • a thread that increments int for a while
  • a thread that does math.factorial on a lot.

It calls start

in threads then join

timed out

Sleep and spin threads work as expected and come back with start

immediately and then sit in join

for a timeout.

Factorial on the other hand does not come back from start

end to end!

import sys
from threading import Thread
from time import sleep, time
from math import factorial

# Helper class that stores a start time to compare to
class timed_thread(Thread):
    def __init__(self, time_start):
        Thread.__init__(self)
        self.time_start = time_start

# Thread that just executes sleep()
class sleep_thread(timed_thread):
    def run(self):
        sleep(15)
        print "st DONE:\t%f" % (time() - time_start)

# Thread that increments a number for a while       
class spin_thread(timed_thread):
    def run(self):
        x = 1
        while x < 120000000:
            x += 1
        print "sp DONE:\t%f" % (time() - time_start)

# Thread that calls math.factorial with a large number
class factorial_thread(timed_thread):
    def run(self):
        factorial(50000)
        print "ft DONE:\t%f" % (time() - time_start)

# the tests

print
print "sleep_thread test"
time_start = time()

st = sleep_thread(time_start)
st.start()
print "st.start:\t%f" % (time() - time_start)
st.join(2)
print "st.join:\t%f" % (time() - time_start)
print "sleep alive:\t%r" % st.isAlive()


print
print "spin_thread test"
time_start = time()

sp = spin_thread(time_start)
sp.start()
print "sp.start:\t%f" % (time() - time_start)
sp.join(2)
print "sp.join:\t%f" % (time() - time_start)
print "sp alive:\t%r" % sp.isAlive()

print
print "factorial_thread test"
time_start = time()

ft = factorial_thread(time_start)
ft.start()
print "ft.start:\t%f" % (time() - time_start)
ft.join(2)
print "ft.join:\t%f" % (time() - time_start)
print "ft alive:\t%r" % ft.isAlive()

      

And here is the output in Python 2.6.5 on CentOS x64:

sleep_thread test
st.start:       0.000675
st.join:        2.006963
sleep alive:    True

spin_thread test
sp.start:       0.000595
sp.join:        2.010066
sp alive:       True

factorial_thread test
ft DONE:        4.475453
ft.start:       4.475589
ft.join:        4.475615
ft alive:       False
st DONE:        10.994519
sp DONE:        12.054668

      

I tried this on python 2.6.5 on CentOS x64, 2.7.2 on Windows x86, and the factorial stream does not return from the beginning on either of them until the stream is executed.

I also tried this with PyPy 1.8.0 on Windows x86 and the result is slightly different. The start returns immediately, but then the connection does not expire!

sleep_thread test
st.start:       0.001000
st.join:        2.001000
sleep alive:    True

spin_thread test
sp.start:       0.000000
sp DONE:        0.197000
sp.join:        0.236000
sp alive:       False

factorial_thread test
ft.start:       0.032000
ft DONE:        9.011000
ft.join:        9.012000
ft alive:       False
st DONE:        12.763000

      

Tried IronPython 2.7.1 too, it produces expected output.

sleep_thread test
st.start:       0.023003
st.join:        2.028122
sleep alive:    True

spin_thread test
sp.start:       0.003014
sp.join:        2.003128
sp alive:       True

factorial_thread test
ft.start:       0.002991
ft.join:        2.004105
ft alive:       True
ft DONE:        5.199295
sp DONE:        5.734322
st DONE:        10.998619

      

+3


source to share


2 answers


Threads are often allowed to interleave different things in Python rather than different things that happen at the same time, because of the Global Interpreter Lock .

If you look at the Python bytecode:

from math import factorial

def fac_test(x):
    factorial(x)

import dis
dis.dis(fac_test)

      



You get:

  4           0 LOAD_GLOBAL              0 (factorial)
              3 LOAD_FAST                0 (x)
              6 CALL_FUNCTION            1
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

      

As you can see, the call math.factorial

is one Python bytecode-level operation ( 6 CALL_FUNCTION

) - it is implemented in C. factorial

does not release the GIL due to the type of work (see comments on my answer), so Python does not switch to other threads while working, and you get the result that you observed.

+5


source


Python has a Global Interpreter Lock (GIL), which requires the threads associated with the processor to execute in sequence rather than concurrently. Since the factorial function is written in C and does not release the GIL, even setting is sys.setswitchinterval

not sufficient to allow threads to communicate.



The module multiprocessing

provides Process objects, which are similar to threads but operate in different address spaces. For CPU bound tasks, you should seriously consider using a module multiprocessing

.

+2


source







All Articles