Speed ​​up newton-raphson in pandas / python

I am currently iterating through a very large dataset of ~ 85GB (~ 600M rows) and just using newton-raphson to compute the new parameter. My code is very slow at the moment, any advice on how to speed it up? The methods from BSCallClass and BSPutClass are private, so there is nothing really to speed up there. Thank.

class NewtonRaphson:

    def __init__(self, theObject):
        self.theObject = theObject

    def solve(self, Target, Start, Tolerance, maxiter=500):
        y = self.theObject.Price(Start)
        x = Start
        i = 0
        while (abs(y - Target) > Tolerance):
            i += 1
            d = self.theObject.Vega(x)
            x += (Target - y) / d
            y = self.theObject.Price(x)
            if i > maxiter:
                x = nan
                break
        return x

    def main():
        for row in a.iterrows():
            print row[1]["X.1"]
            T = (row[1]["X.7"] - row[1]["X.8"]).days
            Spot = row[1]["X.2"]
            Strike = row[1]["X.9"]
            MktPrice = abs(row[1]["X.10"]-row[1]["X.11"])/2
            CPflag = row[1]["X.6"]

            if CPflag == 'call':
                option = BSCallClass(0, 0, T, Spot, Strike)
            elif CPflag == 'put':
                option = BSPutClass(0, 0, T, Spot, Strike)

            a["X.15"][row[0]] = NewtonRaphson(option).solve(MktPrice, .05, .0001)

      

EDIT:

For the curious, I ended up speeding up this whole process significantly by using a lean proposal as well as using the multiprocessing module.

+3


source to share


1 answer


Don't code your own Newton-Raphson method in Python. You will get better performance by using one of the root finders in scipy.optimize like brentq or newton . (Presumably, if you have pandas

, you will install as well scipy

.)


Back of envelope calculation:

Making 600M calls to brentq should be managed on standard hardware:

import scipy.optimize as optimize
def f(x):
    return x**2 - 2

In [28]: %timeit optimize.brentq(f, 0, 10)
100000 loops, best of 3: 4.86 us per loop

      



So if each call optimize.brentq

takes 4.86μs, 600MB calls will take about 4.86 * 600 ~ 3000 seconds ~ 1 hour.


newton

may be slower, but still manageable:

def f(x):
    return x**2 - 2
def fprime(x):
    return 2*x

In [40]: %timeit optimize.newton(f, 10, fprime)
100000 loops, best of 3: 8.22 us per loop

      

+2


source







All Articles