Cumulative Performance on Small Arrays

I have some code that could use some speedup - unfortunately the bottle neck seems to be in the numpy code already. To my surprise, the function np.sum

, not my logarithms, consume most of the time.

This is an example of a toy:

import numpy as np
from scipy.special import xlogy

p = np.random.rand(20,4)

def many_foo(p):
    for i in range(100000):
        foo(p)

def foo(p):
    p_bar = xlogy(p, p)
    p_sum = p_bar.sum()
    p_sum = np.sum(p_bar)

      

kern_prof ( %lprun -f foo many_foo(p)

) gives the following line profiling:

Timer unit: 1e-06 s

Total time: 1.43734 s
File: <ipython-input-63-8f7d1e3cad35>
Function: foo at line 10

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    10                                           def foo(p):
    11    100000       259856      2.6     18.1      p_bar = xlogy(p, p)
    12    100000       404256      4.0     28.1      p_sum = p_bar.sum()
    13    100000       773232      7.7     53.8      p_sum = np.sum(p_bar)

      

The results surprise me. p_bar.sum()

outcompeting np.sum(p_bar)

2x can hint at challenging business issues.

But why is the logarithm calculated so quickly compared to the sum? And is there any hope of speeding up this work without switching to C ++?

Note that the small dimensions p

are representative of the problem I have to solve.

+3


source to share





All Articles