How to deal with data that does not fit into memory in the brain

I have a workout set consisting of ~ 2k 300x400 pxs gray images. The entire collection is ~ 20 MB in size. I am trying to classify these images using a pyramid neural mesh. The problem is, when I load the dataset SupervisedDataSet

, my little python script is consuming about 8GB of memory, which is actually too much.

So, I have questions: How can I find out this dataset using a 10GB laptop? Is there a way to load parts of the dataset "on demand" during training? Is there a way to split the dataset into smaller pieces and feed them to the network one at a time? I couldn't find answers in the pybrain documentation.

This is how I create the dataset:

# returns ([image bytes], category) where category = 1 for apple, category = 0 for banana
def load_images(dir):
    data = []
    for d, n, files in os.walk(dir):
        for f in files:
            category = int(f.startswith('apple_'))
            im = Image.open('{}/{}'.format(d, f))
            data.append((bytearray(im.tobytes()), category))

    return data


def load_data_set(dir):
    print 'loading images'
    data = load_images(dir)

    print 'creating dataset'
    ds = SupervisedDataSet(120000, 1) #120000 bytes each image
    for d in data:
        ds.addSample(d[0], (d[1],))

    return ds

      

Thanks for any help.

+3


source to share





All Articles