OpenMP loop gives different results for the same sequential loop

I have serial code:

double* a = malloc((1000000) * sizeof(double));
double* b = malloc((1000000) * sizeof(double));
double totalA = 0;

for (int i = 0; i < 1000000; i++) {

    if (i == 0) {
        a[i] = sin(i);
    }

    b[i] = sin(i+1);

    if (i < 1000000-1) {
        a[i+1] = b[i];
    }

    totalA += a[i];
}

      

Output totalA

after this sequential loop 0.232883978073

.

Then I have an OpenMP version (note: all variables are reinitialized):

double* a = malloc((1000000) * sizeof(double));
double* b = malloc((1000000) * sizeof(double));
double totalA = 0;

#pragma omp parallel for reduction(+:totalA)
for (int i = 0; i < 1000000; i++) {

    if (i == 0) {
        a[i] = sin(i);
    }

    b[i] = sin(i+1);

    if (i < 1000000-1) {
        a[i+1] = b[i];
    }

    totalA += a[i];
}

      

However, the output is totalA

from this code -0.733714826779

.

I cannot understand what my life is, why it is different.

Thank.


UPDATE

After playing around for a while, it seems like the statements if

in the loop are being ignored strangely. The actual statements in blocks if

are executed on all iterations of the loop (as if the clause if

does not exist).

For example, changing the block if

to:

if (i < 555555) {
    a[i+1] = b[i];
}

      

doesn't seem to matter.

I'm still no wiser than what's going on here.

+3
c parallel-processing openmp multicore reduction


source to share


2 answers


Your code contains a race condition . The conflicting statements are the assignment a[i+1] = b[i];

that writes to the array a

and the operator totalA += a[i];

that is read from a

.

There is no guarantee in your code that the iteration responsible for writing to a specific location in the array is performed prior to the iteration that is read from that location.

To further demonstrate this problem, ordering a segment of a loop containing conflicting statements solves the problem (but most likely destructive to your performance):

#pragma omp parallel for ordered reduction(+:totalA)    
for (int i = 0; i < 1000000; i++) {

   if (i == 0) {
     a[i] = sin(i);
   }

  b[i] = sin(i+1);

  #pragma omp ordered
  {
    if (i < 1000000-1) {
      a[i+1] = b[i];
    }

    totalA += a[i];
  }
}

      

It's probably best to avoid the problem altogether and rewrite your program to get rid of the loop-related dependency:



#define N 1000000

double* a = malloc(N * sizeof(double));
double* b = malloc(N * sizeof(double));
double totalA = 0;

a[0] = sin(0);
totalA += a[0];

#pragma omp parallel for reduction(+:totalA)
for (int i = 0; i < N - 1; i++) {
  b[i] = sin(i + 1);
  a[i + 1] = b[i];
  totalA += a[i + 1];
}

b[N - 1] = sin(N);

      

Finally, please note that as a sin(0) ≈ 0.0

statement

a[0] = sin(0);
totalA += a[0];

      

can simply be replaced with

a[0] = 0.0;

      

+5


source to share


You can simplify your main loop:

a[0] = sin(0);
for (int i = 0; i < N-1; i++) {
    b[i] = sin(i+1);
    a[i+1] = b[i];
    totalA += a[i];
}
totalA += a[N-1];
b[N-1] = sin(N);

      

Let's call the iterator for thread1 i1

and thread2 i2

. Then, when, for example, i2=i1+1

write to a[i1+1]

and read to a[i2]

will be the same address, and the value read or written will depend on which stream is the first. This is a race condition.



A simple fixation is to notice that totalA = a[0] + b[0] + b[1] + ... b[N-2]

.

a[0] = sin(0);
totalA += a[0];
#pragma omp parallel for reduction(+:totalA)
for (int i = 0; i < N-1; i++) {
    b[i] = sin(i+1);
    a[i+1] = b[i];
    totalA += b[i];
}
b[N-1] = sin(N);

      

This creates 0.232883978073

for N=1000000

.

+3


source to share







All Articles
Loading...
X
Show
Funny
Dev
Pics