Julia remote worker using "machine file" - ERROR: connect: host unreachable (EHOSTUNREACH)

I am trying to create a remote worker pool for parallel processing in julia; the "driver" machine runs Ubuntu 14.04 with the following configuration:

julia> versioninfo()
Julia Version 0.3.1
Commit c03f413 (2014-09-21 21:30 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: liblapack.so.3
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

      

And the remote machine is running CentOS 7.0:

julia> versioninfo()
Julia Version 0.3.1
Platform Info:
  System: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblasp.so.0
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

      

I have SSH keys configured for passwordless login (which works great from the command line), but when I try to run julia using a CentOS machine as a remote worker, I get this:

corey@flash:~/rti_julia$ julia --machinefile machinefile 
ERROR: connect: host is unreachable (EHOSTUNREACH)
 in wait at ./task.jl:284
 in wait at ./task.jl:194
 in stream_wait at stream.jl:263
 in wait_connected at stream.jl:301
 in Worker at multi.jl:113
 in create_worker at multi.jl:1064
 in start_cluster_workers at multi.jl:1028
 in addprocs_internal at multi.jl:1234
 in addprocs at multi.jl:1244
 in process_options at ./client.jl:240
 in _start at ./client.jl:354
 in _start_3B_1714 at /usr/bin/../lib/x86_64-linux-gnu/julia/sys.so

corey@flash:~/rti_julia$ Master process (id 1) could not connect within 60.0 seconds.
exiting.

      

I checked / var / log / messages and / var / log / secure on a CentOS machine and it shows the SSH client is connected successfully.

I suspect there is a workflow process happening on a remote machine (CentOS), but the master process on an Ubuntu machine for some reason is unable to connect to a workflow that is spawned on a CentOS machine. (Hence the status message I receive at the end: "Master process (id 1) was unable to connect for 60.0 seconds." Exit.)

The odd part about this is that if I use a CentOS machine as "master" and an Ubuntu machine as a remote worker, everything works fine.

What can I do to get it to work from an Ubuntu machine? Thank.

+3


source to share





All Articles