How does data auditing work in Keras?

I have the following code to grow data using my data in a list as input:

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img import PIL

def augment(file_images_path, dir_save):

    datagen = ImageDataGenerator(rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest')

    with open(file_images_path) as f:

       images_names = f.readlines()
       images_names = [x.strip() for x in images_names]
       for line in images_names:
           img=PIL.Image.open(line)             
           img=img.resize((28,28))                        
           x = img_to_array(img)                                                    
           x = x.reshape((1,) + x.shape)        
           # the .flow() command below generates batches of randomly transformed 
           #images and saves the results to the `dir_save` directory            
           i = 0            
           for batch in datagen.flow(x, batch_size=1, save_to_dir=dir_save, save_prefix='augmented', save_format='tif'):
                i += 1
                if i > 2:
                    break  # otherwise the generator would loop indefinitely

      

I am very new to data augmentation in Keras and I want to know how many image operations Keras is doing on my images per iteration. For example, if I run this code on a list with 14 images, it will generate 126 augmented images. If I run it on a list containing 125 images, it will generate 370 augmented images. My question is why?

+3


source to share


1 answer


If you are using a data extension in Keras, then every time you generate some data, the data will be slightly modified.

Now, some steps to augment the data have a finite number of options (for example, you can either flip the image or not), so using that data can double the number of images you have.



Others have a (practically) infinite number of options. For example, when you specify rotation_range=40

, it means that every time you create an image, that image will rotate at a randomly chosen angle between -40 and 40 degrees.

Hence, when using the data you augmented, you actually have infinitely many different images that can be generated. However, they will be highly correlated and obviously not as good as they actually have an infinite number of images.

+4


source







All Articles