Can a parallel IPython controller have local and remote ipengines?

IPython parallel docs mention:

c = Client(profile='myprofile')

      

or

c = Client('/path/to/my/ipcontroller-client.json')

      

for local ipengines (IIUC) and

c = Client('/path/to/my/ipcontroller-client.json', sshserver='me@myhub.example.com')

      

if my ipengines are on a different server.

But what do I need to do to have a parallel IPython controller, say, manage 8 ipengines from local node and 8 ipengines from remote node device connected via SSH?

Or is this not possible without moving to full blown HDFS, Hadoop, etc.?

My goal is to have a single client (or controller?) Interface that I can send a bunch of load balanced computations where I don't care where it runs and when.

+3


source to share


1 answer


sshserver arg for Client - only for cases where the controller is not directly accessible from the client (for example, a client on a laptop, a controller behind a firewall on a remote network). The customer never needs to know or care for the engines. Also, ssh tunnels are only required when machines are not available to each other. I assume you don't need ssh tunneling for simplicity.

The simplest case:

  • host1

    is where you want to start the controller, client and 5 motors.
  • host2

    is another computing computer on the same local network where you want to run 8 engines

No configuration

  • start the controller listening on all interfaces (so motors can be connected elsewhere on the LAN)

    [host1] ipcontroller --ip=*
    
          

  • (skip if using shared filesystem) send connection files to host2

    [host1] rsync -av $HOME/.ipython/profile_default/security/ host2:.ipython/profile_default/security/
    
          

  • run engine on host 1

    [host1] ipengine
    # or start multiple engines at once:
    [host1] ipcluster engines -n 5
    
          

  • run engines on host2

    [host2] ipengine
    # or start multiple engines at once:
    [host2] ipcluster engines -n 8
    
          

  • open the client on host1:

    [host1] ipython
    In[1]: from IPython import parallel
    In[2]: rc = parallel.Client()
    
          

You should now have access to the machines on both machines.

With configuration

You can also express it all with config. To initialize configuration files:

[host1] ipython profile create --parallel

      

Tell ipcontroller to listen on all interfaces in ipcontroller_config.py

:



c.HubFactory.ip = '*'

      

Tell ipcluster to start the engines with ssh on both host1 and host2 in ipcluster_config.py

:

c.IPClusterEngines.engine_launcher_class = 'SSH'
c.SSHEngineSetLauncher.engines = {
    'host1': 5,
    'host2': 8,
}

      

Start everything with ipcluster

:

[host1] ipcluster start

      

The SSH launcher will take care of copying the connection files to the remote machines.

If you need ssh tunneling you can specify

c.IPControllerApp.ssh_server = u'host1'

      

in ipcontroller_config.py

. IPython should be able to tell if engines or clients are running on host1

, and skip tunneling if not needed. If he cannot figure it out, you can manually specify where the ssh server should be used and leave it out of the configuration, or put it in the config and manually specify that no ssh server should be used, whichever is which is more convenient for you.

+2


source







All Articles