How to deal with data that does not fit into memory in the brain
I have a workout set consisting of ~ 2k 300x400 pxs gray images. The entire collection is ~ 20 MB in size. I am trying to classify these images using a pyramid neural mesh. The problem is, when I load the dataset SupervisedDataSet
, my little python script is consuming about 8GB of memory, which is actually too much.
So, I have questions: How can I find out this dataset using a 10GB laptop? Is there a way to load parts of the dataset "on demand" during training? Is there a way to split the dataset into smaller pieces and feed them to the network one at a time? I couldn't find answers in the pybrain documentation.
This is how I create the dataset:
# returns ([image bytes], category) where category = 1 for apple, category = 0 for banana
def load_images(dir):
data = []
for d, n, files in os.walk(dir):
for f in files:
category = int(f.startswith('apple_'))
im = Image.open('{}/{}'.format(d, f))
data.append((bytearray(im.tobytes()), category))
return data
def load_data_set(dir):
print 'loading images'
data = load_images(dir)
print 'creating dataset'
ds = SupervisedDataSet(120000, 1) #120000 bytes each image
for d in data:
ds.addSample(d[0], (d[1],))
return ds
Thanks for any help.
source to share
No one has answered this question yet
Check out similar questions: