Errors using ompi server for self-connecting processes

I am new to Open MPI and am trying to figure it out. I want to be able to start the process later to connect to a previously started process that can be running on another node via the ompi server, but I keep getting errors from the client. After hours of searching for an answer, I finally ask.

ompi-server --no-daemonize -d -r +
[kurenai:15711] procdir: /tmp/openmpi-sessions-barronj@kurenai_0/30031/0/0
[kurenai:15711] jobdir: /tmp/openmpi-sessions-barronj@kurenai_0/30031/0
[kurenai:15711] top: openmpi-sessions-barronj@kurenai_0
[kurenai:15711] tmp: /tmp
[kurenai:15711] sess_dir_cleanup: job session dir does not exist
[kurenai:15711] procdir: /tmp/openmpi-sessions-barronj@kurenai_0/30031/0/0
[kurenai:15711] jobdir: /tmp/openmpi-sessions-barronj@kurenai_0/30031/0
[kurenai:15711] top: openmpi-sessions-barronj@kurenai_0
[kurenai:15711] tmp: /tmp
1968111616.0;tcp://192.168.1.219:55602
[kurenai:15711] [[30031,0],0] orte-server: up and running!

      

Then I start the server.

mpirun -np 1 --hostfile ~/mpi-hosts --ompi-server "1968111616.0;tcp://192.168.1.219:55602" /home/barronj/ompi_test/port_server
port = 1982005248.0;tcp://192.168.1.219:38916+1982005249.0;tcp://192.168.1.219:41605:300

      

Here is the relevant code that is run by the server.

try {
    MPI::Open_port(MPI::INFO_NULL, port);
} catch (MPI::Exception e) {
    fprintf(stderr, "Server open port error (%d): %s\n", e.Get_error_code(), e.Get_error_string());
    MPI::Finalize();
    return EXIT_FAILURE;
}

MPI::Info info = MPI::Info::Create();
info.Set("ompi_global_scope", "true");

try {
    MPI::Publish_name("test_service", info, port);
} catch (MPI::Exception e) {
    fprintf(stderr, "Server service publish error (%d): %s\n", e.Get_error_code(), e.Get_error_string());
    info.Free();
    MPI::Close_port(port);
    MPI::Finalize();
    return EXIT_FAILURE;
}

info.Free();

printf("port = %s\n", port);

try {
    intercomm = MPI::COMM_SELF.Accept(port, MPI::INFO_NULL, 0);
} catch (MPI::Exception e) {
    fprintf(stderr, "Server accept error (%d): %s\n", e.Get_error_code(), e.Get_error_string());
    MPI::Unpublish_name("test_service", MPI::INFO_NULL, port);
    MPI::Close_port(port);
    MPI::Finalize();
    return EXIT_FAILURE;
}

      

In another node, I run a client and get an error.

mpirun -np 1 --hostfile ~/mpi-hosts --ompi-server "1968111616.0;tcp://192.168.1.219:55602" /home/barronj/ompi_test/port_client
barronj@kurenai password:
Client found test_service on port, 1982005248.0;tcp://192.168.1.219:38916+1982005249.0;tcp://192.168.1.219:41605:300
[athena:07039] [[28058,0],0]-[[30243,0],0] mca_oob_tcp_peer_send_handler: invalid connection state (6) on socket 19

      

As you can see, the service and port have been found. But the connection is throwing an error. Here is the relevant client code.

try {
    MPI::Lookup_name("test_service", MPI_INFO_NULL, port);
} catch (MPI::Exception e) {
    fprintf(stderr, "Service lookup error (%d): %s\n", e.Get_error_code(), e.Get_error_string());
    MPI::Finalize();
    return EXIT_FAILURE;
}

printf("Client found test_service on port, %s\n", port);

try {
    intercomm = MPI::COMM_SELF.Connect(port, MPI_INFO_NULL, 0);
} catch (MPI::Exception e) {
    fprintf(stderr, "Client connect error (%d): %s\n", e.Get_error_code(), e.Get_error_string());
    MPI::Finalize();
    return EXIT_FAILURE;
}

      

Since I'm a beginner, I haven't quite figured it out yet. I've tried using MPI :: COMM_WORLD. It won't fix it.

I'm not sure if this is related, but I tried adding a wait parameter for the server.

mpirun -np 1 --hostfile ~/mpi-hosts --ompi-server "1968111616.0;tcp://192.168.1.219:55602" --wait-for-server /home/barronj/ompi_test/port_client
--------------------------------------------------------------------------
mpirun was instructed to wait for the requested ompi-server, but was unable to
establish contact with the server during the specified wait time:

Server uri:  1968111616.0;tcp://192.168.1.219:55602
Timeout time: 10

Error received: Not supported

Please check to ensure that the requested server matches the actual server
information, and that the server is in operation.
--------------------------------------------------------------------------

      

Adding this option to the server does the same.

I also tried using -ompi-server with file instead of copy-paste. This only creates the same problems.

Any help is appreciated. Thank.

+3


source to share





All Articles