Cython reads in files in parallel and bypasses the GIL
Trying to figure out how to use Cython to bypass the GIL and download files in parallel for IO related tasks. At the moment I have the following Cython code trying to load files n0.npy, n1.py ... n100.npy
def foo_parallel():
cdef int i
for i in prange(100, nogil=True, num_threads=8):
with gil:
np.load('n'+str(i)+'.npy')
return []
def foo_serial():
cdef int i
for i in range(100):
np.load('n'+str(i)+'.npy')
return []
I don't notice any significant speed-up - does anyone have any experience with this?
Edit: I am getting about 900ms in parallel versus 1.3 seconds in succession. Would expect more speedup given 8 threads
source to share
As the comment says, you cannot use NumPy with gil and expect it to become parallel. To do this, you need C or C ++ level file operations. See this post for a potential solution http://www.code-corner.de/?p=183
those. Apply this to your problem: file_io.pyx
I would post it here but can't figure out how in my camera. Add nogil
to the end of the statement cdef
and call a function from a cpdef foo_parallel
specific function in your loop prange
. Use read_file
not slow and change it to cdef
. Please post the benchmarks after that as I'm curious and don't have a computer on vacation.
source to share