Parallelism data in Keras

I am looking for parallelism data in keras (endorflow backend) not parallelism model. I am doing video file classification and hence can only fit batch size 2 in GPU. So I was interested in using multiple GPUs to increase the batch size for better grading and faster learning. Can you suggest me an efficient way to do this?

I am using one 12GB TitanX and one Black Titan Black 6gb.

thank

+3


source to share


1 answer


This is one way to do it:

This method to_multi_gpu

gets model

(defined using Keras 2.0 on one GPU) and returns the same model replicated (with shared parameters) across multiple GPUs. The input to the new model is distributed evenly and each chunk is passed to one of the replicated models. The output from all replicated models is concatenated at the end.



from keras import backend as K
from keras.models import Model
from keras.layers import Input
from keras.layers.core import Lambda
from keras.layers.merge import Concatenate

def slice_batch(x, n_gpus, part):
    """
    Divide the input batch into [n_gpus] slices, and obtain slice number [part].
    i.e. if len(x)=10, then slice_batch(x, 2, 1) will return x[5:].
    """
    sh = K.shape(x)
    L = sh[0] // n_gpus
    if part == n_gpus - 1:
        return x[part*L:]
    return x[part*L:(part+1)*L]


def to_multi_gpu(model, n_gpus=2):
    """
    Given a keras [model], return an equivalent model which parallelizes
    the computation over [n_gpus] GPUs.

    Each GPU gets a slice of the input batch, applies the model on that slice
    and later the outputs of the models are concatenated to a single tensor, 
    hence the user sees a model that behaves the same as the original.
    """
    with tf.device('/cpu:0'):
        x = Input(model.input_shape[1:], name=model.input_names[0])

    towers = []
    for g in range(n_gpus):
        with tf.device('/gpu:' + str(g)):
            slice_g = Lambda(slice_batch, 
                             lambda shape: shape, 
                             arguments={'n_gpus':n_gpus, 'part':g})(x)
            towers.append(model(slice_g))

    with tf.device('/cpu:0'):
        merged = Concatenate(axis=0)(towers)

    return Model(inputs=[x], outputs=[merged])

      

+2


source







All Articles