How to deal with data that does not fit into memory in the brain

I have a workout set consisting of ~ 2k 300x400 pxs gray images. The entire collection is ~ 20 MB in size. I am trying to classify these images using a pyramid neural mesh. The problem is, when I load the dataset SupervisedDataSet

, my little python script is consuming about 8GB of memory, which is actually too much.

So, I have questions: How can I find out this dataset using a 10GB laptop? Is there a way to load parts of the dataset "on demand" during training? Is there a way to split the dataset into smaller pieces and feed them to the network one at a time? I couldn't find answers in the pybrain documentation.

This is how I create the dataset:

# returns ([image bytes], category) where category = 1 for apple, category = 0 for banana
def load_images(dir):
    data = []
    for d, n, files in os.walk(dir):
        for f in files:
            category = int(f.startswith('apple_'))
            im = Image.open('{}/{}'.format(d, f))
            data.append((bytearray(im.tobytes()), category))

    return data


def load_data_set(dir):
    print 'loading images'
    data = load_images(dir)

    print 'creating dataset'
    ds = SupervisedDataSet(120000, 1) #120000 bytes each image
    for d in data:
        ds.addSample(d[0], (d[1],))

    return ds

      

Thanks for any help.

+3
python machine-learning neural-network pybrain


source to share


No one has answered this question yet

Check out similar questions:

5116
How can I check if a file exists without exceptions?
4268
How to combine two dictionaries in one expression?
3790
How can I safely create a subdirectory?
3474
How to list all files in a directory?
3428
How to sort a dictionary by value?
3235
How to check if a list is empty?
2849
How to make a flat list from a list of lists?
2621
How do I create a chain of function decorators?
2601
How can I make a time delay in Python?
2568
How to find the current time in Python



All Articles
Loading...
X
Show
Funny
Dev
Pics