SciKit Learn Parallel Processing 0.17 to 0.18 (Python 2.7)

Question

SciKit Learn Parallel Processing 0.17 to 0.18 (Python 2.7)

For some reason, the code below uses all available kernels, although I set n_jobs to 1. Am I missing something or should I post a question to scikit?

import numpy as np
from sklearn import linear_model

liReg = linear_model.LinearRegression(n_jobs=1)

a = np.random.rand(10000,20)
b = np.random.rand(10000)

for i in range(1000):
    liReg.fit(a, b)
    liReg.predict(a)

I have two identical servers, but one runs scikit v0.18 and one v0.17 - this only happens when using 0.18.

Here's the result time python example.py

:

Using 0.17 - just uses one core:

real    0m8.381s
user    0m6.387s
sys     0m1.677s

Usage 0.18 - uses all kernels:

real    0m32.308s # I guess longer due to overhead of parallel process management
user    2m53.612s
sys     20m48.285s

+3

python scikit-learn multiprocessing

Alexander morley May 15 '17 at 14:45

source to share

1 answer

Alexander morley · Accepted Answer · 2017-05-15T18:54:36+0000

From @GaelVaroquaux on Github: https://github.com/scikit-learn/scikit-learn/issues/8883#issuecomment-301567818

You are most likely using a parallel supported linear algebra library (like MKL or openBLAS). Hence, it is not scikit-learn that does parallel computation, and it cannot control it (this is the component that is used internally by scikit-learn). You need to figure out how to manage the corresponding computing brick.

In my case, I was using OpenBLAS on linux fedora, so I just added: export OPENBLAS_NUM_THREADS=1

to mine .bashrc

to turn off multithreading in the linear algebra call.

SciKit Learn Parallel Processing 0.17 to 0.18 (Python 2.7)

More articles: