Cython reads in files in parallel and bypasses the GIL

Question

Cython reads in files in parallel and bypasses the GIL

Trying to figure out how to use Cython to bypass the GIL and download files in parallel for IO related tasks. At the moment I have the following Cython code trying to load files n0.npy, n1.py ... n100.npy

def foo_parallel():
    cdef int i

    for i in prange(100, nogil=True, num_threads=8):
        with gil:
            np.load('n'+str(i)+'.npy')

    return []

def foo_serial():
    cdef int i

    for i in range(100):
        np.load('n'+str(i)+'.npy')

    return []

I don't notice any significant speed-up - does anyone have any experience with this?

Edit: I am getting about 900ms in parallel versus 1.3 seconds in succession. Would expect more speedup given 8 threads

+3

python parallel-processing cython

Michael June 22. 17 at 20:55

source to share

1 answer

Matt · Answer 1 · 2017-06-24T15:57:48+0000

As the comment says, you cannot use NumPy with gil and expect it to become parallel. To do this, you need C or C ++ level file operations. See this post for a potential solution http://www.code-corner.de/?p=183

those. Apply this to your problem: file_io.pyx

I would post it here but can't figure out how in my camera. Add nogil

to the end of the statement cdef

and call a function from a cpdef foo_parallel

specific function in your loop prange

. Use read_file

not slow and change it to cdef

. Please post the benchmarks after that as I'm curious and don't have a computer on vacation.

Cython reads in files in parallel and bypasses the GIL

More articles: