Generator called at the wrong time (keras)

Question

Generator called at the wrong time (keras)

I am using fit_generator()

in keras 2.0.2 with a batch size of 10 and stages of 320 because I have 3209 samples to train. Before the beginning of the first era, the generator was called 11 times, showing:

Train -- get ind: 0 to 10
    ...    
Train -- get ind: 100 to 110

Then, after the first batch (1/320), it outputs Train -- get ind: 110 to 120

, but I think it should be Train -- get ind: 0 to 10

. Is my implementation for the function train_generator()

wrong? Or why am I having this problem?

Here is my code for the generator:

EPOCH = 10
x_train_img = img[:train_size] # shape: (3209,512,512)
x_test_img = img[train_size:]  # shape: (357,512,512)

def train_generator():
    global x_train_img

    last_ind = 0

    while 1:
        x_train = x_train_img[last_ind:last_ind+BATCH_SIZE]
        print('Train -- get ind: ',last_ind," to ",last_ind+BATCH_SIZE)
        last_ind = last_ind+BATCH_SIZE
        x_train = x_train.astype('float32') / 255.
        x_train = np.reshape(x_train, (len(x_train), 512, 512, 1)) 
        yield (x_train, x_train)
        if last_ind >= x_train_img.shape[0]:
             last_ind = 0

def test_generator():
     ...

train_steps = x_train_img.shape[0]//BATCH_SIZE #320
test_steps = x_test_img.shape[0]//BATCH_SIZE   #35

autoencoder.fit_generator(train_generator(), 
                steps_per_epoch=train_steps, 
                epochs=EPOCH,
                validation_data=test_generator(),
                validation_steps=test_steps,
                callbacks=[csv_logger] )

What's better? way to write generator:

def train_generator():
    global x_train_img

    while 1:
        for i in range(0, x_train_img.shape[0], BATCH_SIZE):
            x_train = x_train_img[i:i+BATCH_SIZE]
            print('Train -- get ind: ',i," to ",i+BATCH_SIZE)
            x_train = x_train.astype('float32') / 255.
            x_train = np.reshape(x_train, (len(x_train), 512, 512, 1)) 
            yield (x_train, x_train)

+3

tensorflow keras

matchifang Jul 25 17 at 12:36

source to share

1 answer

Yu-Yang · Accepted Answer · 2017-07-25T18:10:55+0000

By default it fit_generator()

uses max_queue_size=10

. So, you noticed that:

Before the era begins, your generator gives 10 batches to fill the queue. These are samples from 0 to 100.
Then the era begins, and one batch is unloaded from the queue to install the modem.
The generator gives a new batch to fill the empty space in the queue. These are samples from 100 to 110.
Then the progress bar is updated. The progress 1/320

will be printed on the screen.
Steps 2 and 3 are performed again, it prints get ind: 110 to 120

.

So, there is nothing wrong with this model installation procedure. The first batch generated is indeed the first batch that is used to match the model. It's just that the queue is hiding there and the generator is called multiple times to fill the queue before the first update of the model occurs.

Generator called at the wrong time (keras)

More articles: