MPI_Gather columns

I have an array that is shared by columns between processes for my calculation. Subsequently, I want to collect this array into one process (0).

Each process has its own columns stored in array A, process 0 has array F to collect data. The F-array is n * n in size, each process has part_size columns, so local A arrays are n * part_size. Columns are sent to alternating processes - c0 goes to p0 again, c1 to p1, c2 to p0, etc.

I have created new datatypes for sending and receiving columns.

In all processes:

MPI_Type_vector(n, 1, part_size, MPI::FLOAT, &col_send);
MPI_Type_commit(&col_send);

      

In progress 0:

MPI_Type_vector(n, 1, n, MPI::FLOAT, &col_recv);
MPI_Type_commit(&col_recv);

      

Now I would like to collect an array like this:

MPI_Gather(&A, part_size, col_send, &F, part_size, col_recv, 0, MPI::COMM_WORLD);

      

However, the result is not as expected. My example has n = 4 and two processes. As a result, the values ​​from p0 must be in columns 0 and 2 of F and p1 must be stored in 1 and 3. Instead, both columns of p0 are stored in 0 and 1, and the values ​​of p1 are not present at all.

0: F[0][0]: 8.31786
0: F[0][1]: 3.90439
0: F[0][2]: -60386.2
0: F[0][3]: 4.573e-41
0: F[1][0]: 0
0: F[1][1]: 6.04768
0: F[1][2]: -60386.2
0: F[1][3]: 4.573e-41
0: F[2][0]: 0
0: F[2][1]: 8.88266
0: F[2][2]: -60386.2
0: F[2][3]: 4.573e-41
0: F[3][0]: 0
0: F[3][1]: 0
0: F[3][2]: -60386.2
0: F[3][3]: 4.573e-41

      

I admit that I have no idea about this. I have obviously misunderstood how Gather or Type_vector works and stores their values. Can anyone point me in the right direction? Any help would be much appreciated.

0


source to share


1 answer


The problem I see is that the datatype created with MPI_Type_vector()

has a degree from first to last element. For example:

The degree for your datatype col_recv

is between >

and <

(I hope this mask representation is clear enough):

>x . . .
 x . . .
 x . . .
 x<. . .

      

These are 13 elements of MPI_FLOAT (must be read line by line, this is the C ordering). getting two of them will result in:

>x . . .
 x . . .
 x . . .
 x y . .
 . y . .
 . y . .
 . y . .

      

This is clearly not what you want.

To allow MPI_Gather()

proper data flow on the receiver, you need to set the col_recv

size to ONLY ONE ITEM. You can do this using MPI_Type_create_resized()

:

>x<. . .
 x . . .
 x . . .
 x . . .

      

so getting sequential blocks will interleave correctly:

   x y . . 
   x y . . 
   x y . . 
   x y . . 

      

However, getting two columns instead of one would result in:



   x x y y
   x x y y
   x x y y
   x x y y

      

Again, this is not what you want, even if closer.

Since you want interleaved columns, you need to create a more complex data type capable of describing all columns, with a size of 1 element, as before:

Each column is split (step) as one ELEMENT (i.e. size is not a size, i.e. 4 elements are from a previously defined column):

  >x<. x .
   x . x .
   x . x .
   x . x .

      

getting one of these per cpu will give you what you want:

   x y x y
   x y x y
   x y x y
   x y x y

      

You can do this with MPI_Type_create_darray()

, as it allows you to create datatypes suitable for use with the scalapack block-circular distribution , being your 1D subplot.

I tried it too. Here is the working code on two processors:

#include <mpi.h>

#define N      4
#define NPROCS 2
#define NPART  (N/NPROCS)

int main(int argc, char **argv) {
  float a_send[N][NPART];
  float a_recv[N][N] = {0};
  MPI_Datatype column_send_type;
  MPI_Datatype column_recv_type;
  MPI_Datatype column_send_type1;
  MPI_Datatype column_recv_type1;
  MPI_Datatype matrix_columns_type;
  MPI_Datatype matrix_columns_type1;

  MPI_Init(&argc, &argv);
  int my_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  for(int i=0; i<N; ++i) {
    for(int j=0; j<NPART; ++j) {
      a_send[i][j] = my_rank*100+10*(i+1)+(j+1);
    }
  }

  MPI_Type_vector(N, 1, NPART, MPI_FLOAT, &column_send_type);
  MPI_Type_commit(&column_send_type);

  MPI_Type_create_resized(column_send_type, 0, sizeof(float), &column_send_type1);
  MPI_Type_commit(&column_send_type1);

  MPI_Type_vector(N, 1, N, MPI_FLOAT, &column_recv_type);
  MPI_Type_commit(&column_recv_type);

  MPI_Type_create_resized(column_recv_type, 0, sizeof(float), &column_recv_type1);
  MPI_Type_commit(&column_recv_type1);

  MPI_Type_vector(NPART, 1, NPROCS, column_recv_type1, &matrix_columns_type);
  MPI_Type_commit(&matrix_columns_type);

  MPI_Type_create_resized(matrix_columns_type, 0, sizeof(float), &matrix_columns_type1);
  MPI_Type_commit(&matrix_columns_type1);

  MPI_Gather(a_send, NPART, column_send_type1, a_recv, 1, matrix_columns_type1, 0, MPI_COMM_WORLD);

  if (my_rank==0) {
    for(int i=0; i<N; ++i) {
      for(int j=0; j<N; ++j) {
        printf("%4.0f  ",a_recv[i][j]);
      }
      printf("\n");
    }
  }

  MPI_Finalize();
}

      

+1


source







All Articles