Errors using ompi server for self-connecting processes

I am new to Open MPI and am trying to figure it out. I want to be able to start the process later to connect to a previously started process that can be running on another node via the ompi server, but I keep getting errors from the client. After hours of searching for an answer, I finally ask.

ompi-server --no-daemonize -d -r +
[kurenai:15711] procdir: /tmp/openmpi-sessions-barronj@kurenai_0/30031/0/0
[kurenai:15711] jobdir: /tmp/openmpi-sessions-barronj@kurenai_0/30031/0
[kurenai:15711] top: openmpi-sessions-barronj@kurenai_0
[kurenai:15711] tmp: /tmp
[kurenai:15711] sess_dir_cleanup: job session dir does not exist
[kurenai:15711] procdir: /tmp/openmpi-sessions-barronj@kurenai_0/30031/0/0
[kurenai:15711] jobdir: /tmp/openmpi-sessions-barronj@kurenai_0/30031/0
[kurenai:15711] top: openmpi-sessions-barronj@kurenai_0
[kurenai:15711] tmp: /tmp
1968111616.0;tcp://192.168.1.219:55602
[kurenai:15711] [[30031,0],0] orte-server: up and running!

      

Then I start the server.

mpirun -np 1 --hostfile ~/mpi-hosts --ompi-server "1968111616.0;tcp://192.168.1.219:55602" /home/barronj/ompi_test/port_server
port = 1982005248.0;tcp://192.168.1.219:38916+1982005249.0;tcp://192.168.1.219:41605:300

      

Here is the relevant code that is run by the server.

try {
    MPI::Open_port(MPI::INFO_NULL, port);
} catch (MPI::Exception e) {
    fprintf(stderr, "Server open port error (%d): %s\n", e.Get_error_code(), e.Get_error_string());
    MPI::Finalize();
    return EXIT_FAILURE;
}

MPI::Info info = MPI::Info::Create();
info.Set("ompi_global_scope", "true");

try {
    MPI::Publish_name("test_service", info, port);
} catch (MPI::Exception e) {
    fprintf(stderr, "Server service publish error (%d): %s\n", e.Get_error_code(), e.Get_error_string());
    info.Free();
    MPI::Close_port(port);
    MPI::Finalize();
    return EXIT_FAILURE;
}

info.Free();

printf("port = %s\n", port);

try {
    intercomm = MPI::COMM_SELF.Accept(port, MPI::INFO_NULL, 0);
} catch (MPI::Exception e) {
    fprintf(stderr, "Server accept error (%d): %s\n", e.Get_error_code(), e.Get_error_string());
    MPI::Unpublish_name("test_service", MPI::INFO_NULL, port);
    MPI::Close_port(port);
    MPI::Finalize();
    return EXIT_FAILURE;
}

      

In another node, I run a client and get an error.

mpirun -np 1 --hostfile ~/mpi-hosts --ompi-server "1968111616.0;tcp://192.168.1.219:55602" /home/barronj/ompi_test/port_client
barronj@kurenai password:
Client found test_service on port, 1982005248.0;tcp://192.168.1.219:38916+1982005249.0;tcp://192.168.1.219:41605:300
[athena:07039] [[28058,0],0]-[[30243,0],0] mca_oob_tcp_peer_send_handler: invalid connection state (6) on socket 19

      

As you can see, the service and port have been found. But the connection is throwing an error. Here is the relevant client code.

try {
    MPI::Lookup_name("test_service", MPI_INFO_NULL, port);
} catch (MPI::Exception e) {
    fprintf(stderr, "Service lookup error (%d): %s\n", e.Get_error_code(), e.Get_error_string());
    MPI::Finalize();
    return EXIT_FAILURE;
}

printf("Client found test_service on port, %s\n", port);

try {
    intercomm = MPI::COMM_SELF.Connect(port, MPI_INFO_NULL, 0);
} catch (MPI::Exception e) {
    fprintf(stderr, "Client connect error (%d): %s\n", e.Get_error_code(), e.Get_error_string());
    MPI::Finalize();
    return EXIT_FAILURE;
}

      

Since I'm a beginner, I haven't quite figured it out yet. I've tried using MPI :: COMM_WORLD. It won't fix it.

I'm not sure if this is related, but I tried adding a wait parameter for the server.

mpirun -np 1 --hostfile ~/mpi-hosts --ompi-server "1968111616.0;tcp://192.168.1.219:55602" --wait-for-server /home/barronj/ompi_test/port_client
--------------------------------------------------------------------------
mpirun was instructed to wait for the requested ompi-server, but was unable to
establish contact with the server during the specified wait time:

Server uri:  1968111616.0;tcp://192.168.1.219:55602
Timeout time: 10

Error received: Not supported

Please check to ensure that the requested server matches the actual server
information, and that the server is in operation.
--------------------------------------------------------------------------

      

Adding this option to the server does the same.

I also tried using -ompi-server with file instead of copy-paste. This only creates the same problems.

Any help is appreciated. Thank.

+3
c ++ mpi openmpi


source to share


No one has answered this question yet

Check out similar questions:

23498
Why is processing a sorted array faster than processing an unsorted array?
1518
Image processing: improvement of the algorithm for the recognition of "Coca-Cola Can"
1399
What is undefined link / unresolved external symbol error and how to fix it?
2
Segfaults when running OpenMPI inside Slurm string script
2
MPI client cannot find server port (MPI_ERR_NAME: invalid name argument)
1
Client connects to server, but server doesn't think client has connected to C ++
0
Openmpi with mpi4py not working on multiple nodes
0
how to debug: if MPI can't work on the machine
0
C ++ Message Passing Library like MPI
0
Open MPI hangs on multiple hosts



All Articles
Loading...
X
Show
Funny
Dev
Pics