Generator called at the wrong time (keras)
I am using fit_generator()
in keras 2.0.2 with a batch size of 10 and stages of 320 because I have 3209 samples to train. Before the beginning of the first era, the generator was called 11 times, showing:
Train -- get ind: 0 to 10
...
Train -- get ind: 100 to 110
Then, after the first batch (1/320), it outputs Train -- get ind: 110 to 120
, but I think it should be Train -- get ind: 0 to 10
. Is my implementation for the function train_generator()
wrong? Or why am I having this problem?
Here is my code for the generator:
EPOCH = 10
x_train_img = img[:train_size] # shape: (3209,512,512)
x_test_img = img[train_size:] # shape: (357,512,512)
def train_generator():
global x_train_img
last_ind = 0
while 1:
x_train = x_train_img[last_ind:last_ind+BATCH_SIZE]
print('Train -- get ind: ',last_ind," to ",last_ind+BATCH_SIZE)
last_ind = last_ind+BATCH_SIZE
x_train = x_train.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 512, 512, 1))
yield (x_train, x_train)
if last_ind >= x_train_img.shape[0]:
last_ind = 0
def test_generator():
...
train_steps = x_train_img.shape[0]//BATCH_SIZE #320
test_steps = x_test_img.shape[0]//BATCH_SIZE #35
autoencoder.fit_generator(train_generator(),
steps_per_epoch=train_steps,
epochs=EPOCH,
validation_data=test_generator(),
validation_steps=test_steps,
callbacks=[csv_logger] )
What's better? way to write generator:
def train_generator():
global x_train_img
while 1:
for i in range(0, x_train_img.shape[0], BATCH_SIZE):
x_train = x_train_img[i:i+BATCH_SIZE]
print('Train -- get ind: ',i," to ",i+BATCH_SIZE)
x_train = x_train.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 512, 512, 1))
yield (x_train, x_train)
source to share
By default it fit_generator()
uses max_queue_size=10
. So, you noticed that:
- Before the era begins, your generator gives 10 batches to fill the queue. These are samples from 0 to 100.
- Then the era begins, and one batch is unloaded from the queue to install the modem.
- The generator gives a new batch to fill the empty space in the queue. These are samples from 100 to 110.
- Then the progress bar is updated. The progress
1/320
will be printed on the screen. - Steps 2 and 3 are performed again, it prints
get ind: 110 to 120
.
So, there is nothing wrong with this model installation procedure. The first batch generated is indeed the first batch that is used to match the model. It's just that the queue is hiding there and the generator is called multiple times to fill the queue before the first update of the model occurs.
source to share