The number of MPI processors creates an error, how to implement translation?

I have created a python program to calculate pi. Then I decided to write it using mpi4py to run multiple processes. The program works, but returns a different value for pi than the original python version. As I looked at this issue more, I found that it returns a less accurate value when I run it with more CPUs. Why does the MPI version change the result for more processors? Also would it be more appropriate to use broadcast and then send many separate messages? How can I implement broadcasting if it is more efficient?

MPI version:

#!/apps/moose/miniconda/bin/python
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
name = MPI.Get_processor_name()
def f(x):
    return (1-(float(x)**2))**float(0.5)
n = 1000000
nm = dict()
pi = dict()
for i in range(1,size+1):
    if i == size:
        nm[i] = (i*n/size)+1
    else:
        nm[i] = i*n/size
if rank == 0:
    val = 0
    for i in range(0,nm[1]):
        val = val+f(float(i)/float(n))
    val = val*2
    pi[0] = (float(2)/n)*(float(1)+val)
    print name, "rank", rank, "calculated", pi[0]
    for i in range(1, size):
        pi[i] = comm.recv(source=i, tag=i)
    number = sum(pi.itervalues())
    number = "%.20f" %(number)
    import time
    time.sleep(0.3)
    print "Pi is approximately", number
for proc in range(1, size):
    if proc == rank:
        val = 0
        for i in range(nm[proc]+1,nm[proc+1]):
            val = val+f(float(i)/float(n))
        val = val*2
        pi[proc] = (float(2)/n)*(float(1)+val)
        comm.send(pi[proc], dest=0, tag = proc)
        print name, "rank", rank, "calculated", pi[proc]

      

Original Python version:

#!/usr/bin/python
n = 1000000
def f(x):
    return (1-(float(x)**2))**float(0.5)
val = 0
for i in range(n):
    i = i+1
    val = val+f(float(i)/float(n))
val = val*2
pi = (float(2)/n)*(float(1)+val)
print pi

      

+3


source to share


1 answer


Evaluating your code by calculating the area of ​​a quarter of a disc, that is, an interval using the trapezoid rule.

The problem with your code is that the ranges of i for each process are not complete. Indeed, use a small one n

and type i

to see what happens. For example, for i in range(nm[proc]+1,nm[proc+1]):

you need to change to for i in range(nm[proc],nm[proc+1]):

. Otherwise i = nm [proc] is never processed. Moreover, the pi[0] = (float(2)/n)*(float(1)+val)

and pi[proc] = (float(2)/n)*(float(1)+val)

term float(1)

comes from x = 0 in the integral. But it counts many times, times in every process! Since the number of errors is directly related to the number of processes, an increase in the number of processes decreases accuracy, which is the symptom you reported.

Broadcast corresponds to a situation when all processes of the communicator must receive the same piece of data from this process. Instead, it requires that the data of all processors be combined using the sum to get a result available to one process (called "root"). The last operation is called pruning and is performed comm.Reduce()

.



Here is a code snippet based on yours using and comm.Reduce()

instead of .send()

recv()

from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
name = MPI.Get_processor_name()
def f(x):
    return (1-(float(x)**2))**float(0.5)

n = 10000000
nm =np.zeros(size+1,'i')

nm[0]=1
for i in range(1,size+1):
    if i == size:
        nm[i]=n
    else:
        nm[i] = (i*n)/size

val=0
for i in range(nm[rank],nm[rank+1]):
    val = val+f((float(i))/float(n))

out=np.array(0.0, 'd')
vala=np.array(val, 'd')
comm.Reduce([vala,MPI.DOUBLE],[out,MPI.DOUBLE],op=MPI.SUM,root=0)
if rank == 0:
    number =(float(4)/n)*(out)+float(2)/n
    number = "%.20f" %(number)
    import time
    time.sleep(0.3)
    print "Pi is approximately", number

      

+2


source







All Articles