Julia remote worker using "machine file" - ERROR: connect: host unreachable (EHOSTUNREACH)
I am trying to create a remote worker pool for parallel processing in julia; the "driver" machine runs Ubuntu 14.04 with the following configuration:
julia> versioninfo()
Julia Version 0.3.1
Commit c03f413 (2014-09-21 21:30 UTC)
Platform Info:
System: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
WORD_SIZE: 64
BLAS: libopenblas (NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: liblapack.so.3
LIBM: libopenlibm
LLVM: libLLVM-3.3
And the remote machine is running CentOS 7.0:
julia> versioninfo()
Julia Version 0.3.1
Platform Info:
System: Linux (x86_64-redhat-linux)
CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
WORD_SIZE: 64
BLAS: libopenblas (NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblasp.so.0
LIBM: libopenlibm
LLVM: libLLVM-3.3
I have SSH keys configured for passwordless login (which works great from the command line), but when I try to run julia using a CentOS machine as a remote worker, I get this:
corey@flash:~/rti_julia$ julia --machinefile machinefile
ERROR: connect: host is unreachable (EHOSTUNREACH)
in wait at ./task.jl:284
in wait at ./task.jl:194
in stream_wait at stream.jl:263
in wait_connected at stream.jl:301
in Worker at multi.jl:113
in create_worker at multi.jl:1064
in start_cluster_workers at multi.jl:1028
in addprocs_internal at multi.jl:1234
in addprocs at multi.jl:1244
in process_options at ./client.jl:240
in _start at ./client.jl:354
in _start_3B_1714 at /usr/bin/../lib/x86_64-linux-gnu/julia/sys.so
corey@flash:~/rti_julia$ Master process (id 1) could not connect within 60.0 seconds.
exiting.
I checked / var / log / messages and / var / log / secure on a CentOS machine and it shows the SSH client is connected successfully.
I suspect there is a workflow process happening on a remote machine (CentOS), but the master process on an Ubuntu machine for some reason is unable to connect to a workflow that is spawned on a CentOS machine. (Hence the status message I receive at the end: "Master process (id 1) was unable to connect for 60.0 seconds." Exit.)
The odd part about this is that if I use a CentOS machine as "master" and an Ubuntu machine as a remote worker, everything works fine.
What can I do to get it to work from an Ubuntu machine? Thank.
source to share
No one has answered this question yet
Check out similar questions: