Python / Numpy - Speed ​​up Monte Carlo Method for Radioactive Decay

I am trying to optimize the decay time generation of a radioactive isotope of Monte Carlo. This gives the nsims

atoms of an isotope with a half-life t12

when each isotope decays? I tried to optimize this by generating random numbers for all the non-decaying atoms at once using a single call numpy.random.random

(I call this method parallel), but I hope the performance is even better. I also show a method that does this calculation for each isotope individually (sequential).

import numpy as np
import time
import matplotlib.pyplot as plt
import scipy.optimize

t12 = 3.1*60.
dt = 0.01
ln2 = np.log(2)
decay_exp = lambda t,A,tau: A * np.exp(-t/tau)

def serial(nsims):
    sim_start_time = time.clock()
    decay_time = np.zeros(nsims)
    for i in range(nsims):
        t = dt
        while decay_time[i] == 0:
            if np.random.random() > np.exp(-ln2*dt/t12):
                decay_time[i] = t
            t += dt
    sim_end_time = time.clock()
    return (sim_end_time - sim_start_time,decay_time)

def parallel(nsims):
    sim_start_time = time.clock()
    decay_time = np.zeros(nsims)
    t = dt
    while 0 in decay_time:
        inot_decayed = np.where(decay_time == 0)[0]
        idecay_check = np.random.random(len(inot_decayed)) > np.exp(-ln2*dt/t12)
        decay_time[inot_decayed[np.where(idecay_check==True)[0]]] = t
        t += dt
    sim_end_time = time.clock()
    return (sim_end_time - sim_start_time,decay_time)

      

I am interested in any suggestions that work better than a function parallel

that is pure python, i.e. not cython. This method already significantly improves the method of serial

calculating this for large ones nsims

.

The performance gain of parallel vs serial functions, is there a better method?

+3


source to share


1 answer


There is still some speed gain to be gained from your original execution "in parallel" (vectorized correct word).

Note, this is micromanagement, but it still gives a slight increase in performance.

import numpy as np
t12 = 3.1*60.
dt = 0.01
ln2 = np.log(2)

s = 98765

def parallel(nsims):  # your code, unaltered, except removed inaccurate timing method
    decay_time = np.zeros(nsims)
    t = dt
    np.random.seed(s) # also had to add a seed to get comparable results
    while 0 in decay_time:
        inot_decayed = np.where(decay_time == 0)[0]
        idecay_check = np.random.random(len(inot_decayed)) > np.exp(-ln2*dt/t12)
        decay_time[inot_decayed[np.where(idecay_check==True)[0]]] = t
        t += dt
    return decay_time

def parallel_micro(nsims): # micromanaged code
    decay_time = np.zeros(nsims)
    t = dt
    half_time = np.exp(-ln2*dt/t12)  # there was no need to calculate this again in every loop iteration
    np.random.seed(s)  # fixed seed to get comparable results
    while 0 in decay_time:
        inot_decayed = np.where(decay_time == 0)[0]  # only here you need the call to np.where
        # to my own surprise, len(some_array) is quicker than some_array.size (function lookup vs attribute lookup)
        idecay_check = np.random.random(len(inot_decayed)) > half_time
        decay_time[inot_decayed[idecay_check]] = t # no need for another np.where and certainly not for another boolean comparison
        t += dt
    return decay_time

      

You can perform time measurements using the timeit module . Profiling will tell you that the challenge is the bottleneck here np.where

.

Knowing what the bottleneck is np.where

, you can get rid of it like this:



def parallel_micro2(nsims):
    decay_time = np.zeros(nsims)
    t = dt
    half_time = np.exp(-ln2*dt/t12)
    np.random.seed(s)
    indices = np.where(decay_time==0)[0]
    u = len(indices)
    while u:
        decayed = np.random.random(u) > half_time
        decay_time[indices[decayed]] = t
        indices = indices[np.logical_not(decayed)]
        u = len(indices)
        t += dt
    return decay_time

      

And this gives a pretty big speed boost:

In [2]: %timeit -n1 -r1 parallel_micro2(1e4)
1 loops, best of 1: 7.81 s per loop

In [3]: %timeit -n1 -r1 parallel_micro(1e4)
1 loops, best of 1: 29 s per loop

In [4]: %timeit -n1 -r1 parallel(1e4)
1 loops, best of 1: 33.5 s per loop

      

Remember to get rid of the call np.random.seed

when you're done optimizing.

+1


source







All Articles