Releasing MPI with collective functions

I am writing a simple program in C with MPI library. The purpose of this program is as follows:

I have a group of processes that perform iterative cycle, at the end of this cycle all processes in the communicator must call two collective functions ( MPI_Allreduce

and MPI_Bcast

). The first sends the identifier of the processes that generated the minimum value of the variable num.val

, and the second transfers from the source num_min.idx_v

to all processes in the communicator MPI_COMM_WORLD

.

The problem is that I don't know if the i-th process will finish before the collective functions are called. All processes have a 1/10 probability of completion. This simulates the behavior of a real program that I am implementing. And when the first process ends, the others are deadlocked.

This is the code:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

typedef struct double_int{
    double val;
    int idx_v;
}double_int;

int main(int argc, char **argv)
{
    int n = 10;
    int max_it = 4000;
    int proc_id, n_proc;double *x = (double *)malloc(n*sizeof(double));

    MPI_Init(&argc, &argv);

    MPI_Comm_size(MPI_COMM_WORLD, &n_proc);
    MPI_Comm_rank(MPI_COMM_WORLD, &proc_id);

    srand(proc_id);

    double_int num_min;
    double_int num;

    int k;
    for(k = 0; k < max_it; k++){

        num.idx_v = proc_id;
        num.val = rand()/(double)RAND_MAX;

        if((rand() % 10) == 0){

            printf("iter %d: proc %d terminato\n", k, proc_id);

            MPI_Finalize();
            exit(EXIT_SUCCESS);
        }

        MPI_Allreduce(&num, &num_min, 1, MPI_DOUBLE_INT, MPI_MINLOC, MPI_COMM_WORLD);
        MPI_Bcast(x, n, MPI_DOUBLE, num_min.idx_v, MPI_COMM_WORLD);
    }

    MPI_Finalize();
    exit(EXIT_SUCCESS);
}

      

Perhaps I need to create a new group and a new communicator before calling the MPI_Finalize function in the if statement? How do I solve this?

+3


source to share


1 answer


If you have control of the process before it finishes, you should send a non-blocking flag to a rank that cannot be broken before (let's call it the root rank). Then instead of blocking all_reduce, you can send from all ranks to the root rank with their value.

The rank of the root can publish a non-blocking technique for a possible flag and value. All titles had to send one or the other. Once all the ranks are accounted for, you can make a reduction on the root rank, remove from the ranked ranks, and pass it.



If your ranks go out without warning, I'm not sure what your options are.

0


source







All Articles