Memory management in the Densetet Tensorflow API

Question

Memory management in the Densetet Tensorflow API

I have a training dataset that is too large to fit into memory, so my code only reads 1000 records from disk at a time. Now I would like to use the Tensorflow new Dataset API . Does the Dataset API let you specify the number of records to keep in memory, or does Tensorflow automatically manage the memory so that I don't need to?

+3

tensorflow tensorflow-datasets

user554481 16 jul. '17 at 3:35

source to share

3 answers

Maosi chen · Answer 1 · 2017-08-29T22:53:53+0000

Yes. Example from the official guide (Using the Dataset API for TensorFlow Inlet Piping, https://www.tensorflow.org/programmers_guide/datasets )

filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
dataset = tf.contrib.data.TFRecordDataset(filenames)
dataset = dataset.map(...) ## Parsing data with a user specified function
dataset = dataset.shuffle(buffer_size=10000) ## 10000: size of sample/record pool for random selection
dataset = dataset.batch(32) ## 32: number of samples/records per batch (to be read into memory)
dataset = dataset.repeat() ## None: keep repeating

Salvador dali · Answer 2 · 2017-07-16T04:20:34+0000

If you specify the number of records using batch_size . In this case, TF will only grab the batch_size items from the file. You can also specify shuffle , and this will ensure that it is in maximum buffer_size

elements all the time in memory .

I checked it in the tfrecords files. I have 100 tfrecords files, each of which is ~ 10GB (which is more than the memory on my laptop). And everything works fine.

allen · Answer 3 · 2018-02-08T06:08:53+0000

dataset = dataset.prefetch (buffer_size) I guess prefetch will do this? If buffer_size is set large enough then are all tfrcords kept in memory? Buffer_size value in Dataset.map, Dataset.prefetch and Dataset.shuffle

Memory management in the Densetet Tensorflow API

More articles: