Calling a subprocess in a script using mpi4py

I was unable to call an external program from my python script in which I want to use mpi4py to distribute the workload across different processors.

Basically, I want to use my script so that each core prepares some input files for computation in separate folders, then runs an external program in that folder, waits for the output, and then finally reads the results and collects them.

However, I just can't seem to get the external call to the program to work. While looking for a solution to this problem, I found that the problems I am facing seem to be very fundamental. The following simple example makes it clear:

#!/usr/bin/env python
import subprocess

subprocess.call("EXTERNAL_PROGRAM", shell=True)
subprocess.call("echo test", shell=True)

      

./script.py

works fine (both calls work) and mpirun -np 1 ./script.py

only outputs test

. Is there any workaround for this situation? The program is definitely in my PATH, but it also fails if I use the abolute path to call.

This SO question seems to be related, unfortunately there are no answers ...

EDIT:

In the original version of my question, Ive not included any code using mpi4py, although I mention this module in the title. So here's a more detailed code example:

#!/usr/bin/env python

import os
import subprocess

from mpi4py import MPI


def worker(parameter=None):
    """Make new folder, cd into it, prepare the config files and execute the
    external program."""

    cwd = os.getcwd()
    dir = "_calculation_" + parameter
    dir = os.path.join(cwd, dir)
    os.makedirs(dir)
    os.chdir(dir)

    # Write input for simulation & execute
    subprocess.call("echo {} > input.cfg".format(parameter), shell=True)
    subprocess.call("EXTERNAL_PROGRAM", shell=True)

    # After the program is finished, do something here with the output files
    # and return the data. I'm using the input parameter as a dummy variable
    # for the processed output.
    data = parameter

    os.chdir(cwd)

    return data


def run_parallel():
    """Iterate over job_args in parallel."""

    comm = MPI.COMM_WORLD
    size = comm.Get_size()
    rank = comm.Get_rank()

    if rank == 0:
        # Here should normally be a list with many more entries, subdivided
        # among all the available cores. I'll keep it simple here, so one has
        # to run this script with mpirun -np 2 ./script.py
        job_args = ["a", "b"]
    else:
        job_args = None

    job_arg = comm.scatter(job_args, root=0)
    res = worker(parameter=job_arg)
    results = comm.gather(res, root=0)

    print res
    print results

if __name__ == '__main__':
    run_parallel()

      

Unfortunately I cannot provide more details about the external executable EXTERNAL_PROGRAM, other than that it is a C ++ application that supports MPI. As written in the comment section below, I suspect this is the reason (or one of the resonances) why my external program call is mostly ignored.

Please note that Im aware of the fact that in this situation no one can reproduce my exact situation. However, I was hoping someone here already ran into similar issues and could help.

For the sake of completeness, the OS is Ubuntu 14.04 and Im using OpenMPI 1.6.5.

+3


source to share


1 answer


In the first example, you can do this:

#!/usr/bin/env python
import subprocess

subprocess.call("EXTERNAL_PROGRAM && echo test", shell=True)

      



The python script only makes it easier to call MPI. You can also write a bash script with "EXTERNAL_PROGRAM && echo test" and mpirun bash script; this would be the equivalent of mpirunning python script.

The second example will not work if EXTERNAL_PROGRAM is MPI enabled. When using mpi4py, it initializes MPI. You cannot create another MPI program once you have initialized the MPI environment in this way. You can spawn with MPI_Comm_spawn or MPI_Comm_spawn_multiple and -up for mpirun. For mpi4py refer to Compute PI example for spawning (use MPI.COMM_SELF.Spawn).

0


source







All Articles