Julia remote worker using "machine file" - ERROR: connect: host unreachable (EHOSTUNREACH)
I am trying to create a remote worker pool for parallel processing in julia; the "driver" machine runs Ubuntu 14.04 with the following configuration:
versioninfo() Julia Version 0.3.1 Commit c03f413 (2014-09-21 21:30 UTC) Platform Info: System: Linux (x86_64-linux-gnu) CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz WORD_SIZE: 64 BLAS: libopenblas (NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: liblapack.so.3 LIBM: libopenlibm LLVM: libLLVM-3.3
And the remote machine is running CentOS 7.0:
versioninfo() Julia Version 0.3.1 Platform Info: System: Linux (x86_64-redhat-linux) CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz WORD_SIZE: 64 BLAS: libopenblas (NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Sandybridge) LAPACK: libopenblasp.so.0 LIBM: libopenlibm LLVM: libLLVM-3.3
I have SSH keys configured for passwordless login (which works great from the command line), but when I try to run julia using a CentOS machine as a remote worker, I get this:
corey@flash:~/rti_julia$ julia --machinefile machinefile ERROR: connect: host is unreachable (EHOSTUNREACH) in wait at ./task.jl:284 in wait at ./task.jl:194 in stream_wait at stream.jl:263 in wait_connected at stream.jl:301 in Worker at multi.jl:113 in create_worker at multi.jl:1064 in start_cluster_workers at multi.jl:1028 in addprocs_internal at multi.jl:1234 in addprocs at multi.jl:1244 in process_options at ./client.jl:240 in _start at ./client.jl:354 in _start_3B_1714 at /usr/bin/../lib/x86_64-linux-gnu/julia/sys.so corey@flash:~/rti_julia$ Master process (id 1) could not connect within 60.0 seconds. exiting.
I checked / var / log / messages and / var / log / secure on a CentOS machine and it shows the SSH client is connected successfully.
I suspect there is a workflow process happening on a remote machine (CentOS), but the master process on an Ubuntu machine for some reason is unable to connect to a workflow that is spawned on a CentOS machine. (Hence the status message I receive at the end: "Master process (id 1) was unable to connect for 60.0 seconds." Exit.)
The odd part about this is that if I use a CentOS machine as "master" and an Ubuntu machine as a remote worker, everything works fine.
What can I do to get it to work from an Ubuntu machine? Thank.
source to share
No one has answered this question yet
Check out similar questions: