MPI: what to do if the number of expected MPI_Recvs is not specified

I have many slave nodes that may or may not send messages to the master node. So there is currently no way the node master knows how much MPI_Recv to expect. The slaves had to send the minimum number of messages to the master node for efficiency reasons.

I managed to find a cool trick that sends an additional "done" message when it no longer expects any messages. Unfortunately this does not work in my case where there is a variable number of senders. Any idea on how to do this? Thank!

if(rank == 0){ //MASTER NODE

    while (1) {

        MPI_Recv(&buffer, 10, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);

        if (status.MPI_TAG == DONE) break;


        /* Do stuff */
    }

}else{ //MANY SLAVE NODES

    if(some conditions){
        MPI_Send(&buffer, 64, MPI_INT, root, 1, MPI_COMM_WORLD);
    }

}


MPI_Barrier(MPI_COMM_WORLD);
MPI_Send(NULL, 1, MPI_INT, root, DONE, MPI_COMM_WORLD);

      

Doesn't work, the program seems to still be waiting for MPI_Recv

+3


source to share


2 answers


A simpler and more elegant option is to use MPI_IBARRIER

. Ask each worker to call all messages they sent, and then call MPI_IBARRIER

when done. In the main, you can execute the cycle both MPI_IRECV

on MPI_ANY_SOURCE

and on MPI_IBARRIER

. When executed MPI_IBARRIER

, you know that everything is finished and you can undo MPI_IRECV

and move on. The pseudocode will look something like this:



if (master) {
  /* Start the barrier. Each process will join when it done. */
  MPI_Ibarrier(MPI_COMM_WORLD, &requests[0]);

  do {
    /* Do the work */
    MPI_Irecv(..., MPI_ANY_SOURCE, &requests[1]);

    /* If the index that finished is 1, we received a message.
     * Otherwise, we finished the barrier and we're done. */
    MPI_Waitany(2, requests, &index, MPI_STATUSES_IGNORE);
  } while (index == 1);

  /* If we're done, we should cancel the receive request and move on. */
  MPI_Cancel(&requests[1]);
} else {
  /* Keep sending work back to the master until we're done. */
  while( ...work is to be done... ) {
    MPI_Send(...);
  }

  /* When we finish, join the Ibarrier. Note that
   * you can't use an MPI_Barrier here because it
   * has to match with the MPI_Ibarrier above. */
  MPI_Ibarrier(MPI_COMM_WORLD, &request);
  MPI_Wait(&request, MPI_STATUS_IGNORE);
}

      

+2


source


1- you called MPI_Barrier

in the wrong place, it should be called after MPI_Send

.
2- root will exit the loop when it gets DONE

from all other ranks (size -1).

code after some changes:

#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char** argv)
{

    MPI_Init(NULL, NULL);
    int size;
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Status status;
    int DONE = 888;
    int buffer = 77;
    int root = 0 ;
    printf("here is rank %d with size=%d\n" , rank , size);fflush(stdout);
    int num_of_DONE = 0 ;
 if(rank == 0){ //MASTER NODE


    while (1) {

        MPI_Recv(&buffer, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
        printf("root recev %d from %d with tag = %d\n" , buffer , status.MPI_SOURCE , status.MPI_TAG );fflush(stdout);

        if (status.MPI_TAG == DONE)
        num_of_DONE++;
    printf("num_of_DONE=%d\n" , num_of_DONE);fflush(stdout);
    if(num_of_DONE == size -1)
        break;



        /* Do stuff */
    }

}else{ //MANY SLAVE NODES

    if(1){
        buffer = 66;
        MPI_Send(&buffer, 1, MPI_INT, root, 1, MPI_COMM_WORLD);
        printf("rank %d sent data.\n" , rank);fflush(stdout);
    }

}

    if(rank != 0)
    {
        buffer = 55;
        MPI_Send(&buffer, 1, MPI_INT, root, DONE, MPI_COMM_WORLD);
    }


    MPI_Barrier(MPI_COMM_WORLD);
    printf("rank %d done.\n" , rank);fflush(stdout);
    MPI_Finalize();
    return 0;
}

      



output:

    hosam@hosamPPc:~/Desktop$ mpicc -o aa aa.c
    hosam@hosamPPc:~/Desktop$ mpirun -n 3 ./aa
here is rank 2 with size=3
here is rank 0 with size=3
rank 2 sent data.
here is rank 1 with size=3
rank 1 sent data.
root recev 66 from 1 with tag = 1
num_of_DONE=0
root recev 66 from 2 with tag = 1
num_of_DONE=0
root recev 55 from 2 with tag = 888
num_of_DONE=1
root recev 55 from 1 with tag = 888
num_of_DONE=2
rank 0 done.
rank 1 done.
rank 2 done.

      

0


source







All Articles