TensorFlow GPU
I need help optimizing a custom TensorFlow model. I have a 40GB ZLIB compressed .TFRecords file containing my training data. Each sample consists of two 384x512x3 images and a 384x512x2 vector field. I am loading my data like this:
num_threads = 16
reader_kwargs = {'options': tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.ZLIB)}
data_provider = slim.dataset_data_provider.DatasetDataProvider(
dataset,
num_readers=num_threads,
reader_kwargs=reader_kwargs)
image_a, image_b, flow = data_provider.get(['image_a', 'image_b', 'flow'])
image_as, image_bs, flows = tf.train.batch(
[image_a, image_b, flow],
batch_size=dataset_config['BATCH_SIZE'], # 8
capacity=dataset_config['BATCH_SIZE'] * 10,
num_threads=num_threads,
allow_smaller_final_batch=False)
However, I am getting 0.25 to 0.30 global steps / second. (SLOW!)
Here is my TensorBoard silence for a parallel reader. It is at 99% -100%.
I was using GPU with time (% per second). It looks like the data is starving, but I'm not sure how to fix this. I've tried increasing / decreasing the number of threads, but it doesn't seem to make any difference. I am training on an NVIDIA K80 GPU with 4 processors and 61GB of RAM.
How can I make this train faster?
source to share
If your examples are small, then using the DataSetProvider will not lead to satisfying results. He only reads one example at a time, which might be the neck of a bottle. I have already added feature request to github .
At the same time, you will have to collapse your own input queue which uses read_up_to
:
batch_size = 10000
num_tfrecords_at_once = 1024
reader = tf.TFRecordReader()
# Here where the magic happens:
_, records = reader.read_up_to(filename_queue, num_tfrecords_at_once)
# Batch records with 'enqueue_many=True'
batch_serialized_example = tf.train.shuffle_batch(
[records],
num_threads=num_threads,
batch_size=batch_size,
capacity=10 * batch_size,
min_after_dequeue=2 * batch_size,
enqueue_many=True)
parsed = tf.parse_example(
batch_serialized_example,
features=whatever_features_you_have)
# Use parsed['feature_name'] etc. below
source to share