Tensor Flow Using 2 GPUs Simultaneously

First, I am still new to tensorflow. I am using v0.9 and trying to use 2 GPUs installed on the computer we have. So here's what's going on:

  • When I run the training data

    script on the machine, it only runs on one of the two GPUs. By default, it takes the first one gpu:0/

    .
  • When I run another training data

    script to run on the second GPU (after making the necessary changes, i.e. with tf.device..

    ) while keeping the first process on the first GPU, tensorflow kills the first process and only uses the second GPU to start the second process. So it seems like only one process at a time is allowed by tensorflow?

What I need to: run two separate scripts training data

for two different models on two different GPUs installed on the same machine . Am I missing something in this case? Is this the expected behavior? Do I have to go through a distributed tensorflow on the local machine to do this?

+3


source to share


2 answers


So it seems like only one process at a time is allowed by tensorflow?

Nope. I mean, there is no such limit.

Is this the expected behavior? Do I have to go through a distributed tensorflow on my local machine to do this?

This is not expected behavior, it might be a problem since what you want to do I am quite possible (I am running it now).


First, the CUDA

environement variable is used CUDA_VISIBLE_DEVICE

, which as you might guess, set the visible GPUs for the session.

This means that if you want to run two processes on different GPUs, the easiest way is to open two consoles and do:

single GPU process (# 1):

export CUDA_VISIBLE_DEVICE=0
./train.py

      

single GPU process (# 2):



export CUDA_VISIBLE_DEVICE=1
./train.py

      

I'm assuming yours is CUDA_VISIBLE_DEVICE

somehow set to O (or 1), which will actually be causing the problem.

If you want to use both GPUs for the same process, you can run:

Dual GPU process:

export CUDA_VISIBLE_DEVICE=0,1
./train.py

      

or even:

CPU processor (disable GPU):

export CUDA_VISIBLE_DEVICE=
./train.py

      

Hope it helps pltrdy

+3


source


Tensorflow tries to allocate some space on every GPU it sees.

To get around this, make Tensorflow look at one (and the other) GPU for each script: for that you need to use an environment variable CUDA_VISIBLE_DEVICES

like this:



CUDA_VISIBLE_DEVICES=0 python script_one.py
CUDA_VISIBLE_DEVICES=1 python script_two.py

      

In script_one.py

and script_two.py

use tf.device("/gpu:0")

to place the device on the only GPU it sees.

+3


source







All Articles