How can I improve the performance of numpy in this shortcode?

I'm trying to figure out why one of my python scripts is about 4x slow when compared to gfortran, and I got to that:

import numpy as np

nvar_x=40
nvar_y=10

def fn_tst(x):
    for i in range(int(1e7)):
        y=np.repeat(x,1+nvar_y)
    return y

x = np.arange(40)
y = fn_tst(x)

print y.min(),y.max()

      

This is about 13 times slower than the following fortran code

module test
integer,parameter::nvar_x=40,nvar_y=10
contains
subroutine fn_tst(x,y)
real,dimension(nvar_x)::x
real,dimension(nvar_x*(1+nvar_y))::y

do i = 1,10000000
   do k = 1,nvar_x
      y(k)=x(k)
      ibeg=nvar_x+(k-1)*nvar_y+1
      iend=ibeg+nvar_y-1
      y(ibeg:iend)=x(k)
   enddo
enddo

end subroutine fn_tst
end module test

program tst_cp
use test
real,dimension(nvar_x)::x
real,dimension(nvar_x*(1+nvar_y))::y
do k = 1,nvar_x
   x(k)=k-1
enddo

call fn_tst(x,y)

print *,minval(y),maxval(y)

stop
end

      

Can you suggest ways to speed up the python script. Other pointers for good performance with numpy would also be appreciated. I would rather stick with python than create python wrappers for fortran routines.

thank

@isedev So here it is. 1.2s gfortran vs 6.3s for Python? This is the first time I've worried about performance, but as I said, I could get about a quarter of the speed of gfortran with Python in the code I was trying to speed up.

And rightly so, sorry, the codes didn't do the same. Indeed, what you indicate in the loop looks more like what I have in the original code.

If I am missing something, I disagree with the last statement: I need to create y in fn_tst. and np.repeat is just one of the conditions in RHS (put o / p directly on the existing array). If I comment on the term np.repeat everything is fast ...

rhs_slow = rhs[:J]
rhs_fast = rhs[J:]

rhs_fast[:] = c* ( b*in2[3:-1] * ( in2[1:-3] - in2[4:]  ) - fast) + hc_ovr_b * np.repeat(slow,K) #slow

      

+3


source to share


1 answer


For starters, the python code does not generate the same output as the fortran code. In fortr, y is the sequence from 0 to 39, followed by ten 0, ten 1, ..., up to ten 39. The python code prints eleven 0, eleven 1 through eleven 39.

This code produces the same output and performs a similar amount of memory allusions as source code:

import numpy as np

nvar_x = 40
nvar_y = 10

def fn_tst(x):
    for i in range(10000000):
        y = np.empty(nvar_x*(1+nvar_y))
        y[0:nvar_x] = x[0:nvar_x]
        y[nvar_x:] = np.repeat(x,nvar_y)
    return y

x = np.arange(40)
fn_tst(x)

print y.min(), y.max()

      

On my system (with only 1,000,000 cycles) the fortran code runs in 1.2s and up in python in 8.6s.

However, this is not a fair comparison: with fortran code, y is allocated once (outside the fn_tst procedure) and with python code, y is allocated inside the fn_tst function.



So, rewriting the Python code as follows provides a better comparison:

import numpy as np

nvar_x = 40
nvar_y = 10

def fn_tst(x,y):
    for i in range(10000000):
        y[0:nvar_x] = x[0:nvar_x]
        y[nvar_x:] = np.repeat(x,nvar_y)
    return y

x = np.arange(40)
y = np.empty(nvar_x*(1+nvar_y))
fn_tst(x,y)

print y.min(), y.max()

      

On my system, the above works in 6.3s (again, 1,000,000 iterations). So already ok. 25% faster.

In this case, the main performance hit is that numpy.repeat () creates an array, which must then be copied back to y. Things will be much faster if numpy.repeat () can be instructed to place its output directly on an existing array (i.e. Y in this case) ... but that is not possible.

+5


source







All Articles