Tensorflow Dataset API doubles protobuff file size
Summary : Using the new tf.contrib.data.Dataset doubles the size of my protobuff file in the graph, and I cannot render the graph in Tensorboard.
Details:
I am testing the new TansorFlow feature tf.contrib.data.Dataset
along with tf.contrib.learn.Experiment
. My input is defined as input functions that return function and label tensors.
If I create my input function using a function tf.train.slice_input_producer
like in the following code block (full code here ) then my resulting file graph.pbtxt
is 620M and the files .meta
are about 165M in size.
def train_inputs():
with tf.name_scope('Training_data'):
x = tf.constant(mnist.train.images.reshape([-1, 28, 28, 1]))
y = tf.constant(mnist.train.labels)
sliced_input = tf.train.slice_input_producer(
tensor_list=[x, y], shuffle=True)
return tf.train.shuffle_batch(
sliced_input, batch_size=batch_size,
capacity=10000, min_after_dequeue=batch_size*10)
Now if I create my input function with a new tf.contrib.data.Dataset.from_tensor_slices
one like in the following codeblock (full code here ) then the resulting file graph.pbtxt
doubles in size to 1.3G and .meta
files doubles in size to 330M.
def train_inputs():
with tf.name_scope('Training_data'):
images = mnist.train.images.reshape([-1, 28, 28, 1])
labels = mnist.train.labels
dataset = tf.contrib.data.Dataset.from_tensor_slices(
(images, labels))
dataset = dataset.repeat(None) # Infinite
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()
next_example, next_label = iterator.get_next()
return next_example, next_label
Now, since the file is graph.pbtxt
so large that TensorBoard takes a long time to parse this file, and I cannot visually debug the plot of the model. I found in the Dataset Documentation that this increase in size comes from: "the contents of the array will be copied multiple times" rather than using placeholders. However, in this case, I would need to feed the numpy arrays into placeholders with an active session to initialize the iterator:
sess.run(iterator.initializer, feed_dict={features_placeholder: features, labels_placeholder: labels})
This, however, does not seem to be at my disposal when using the frame tf.contrib.learn.Experiment
.
How can I initialize an iterator initializer using the Experiment framework? Or find a workaround to use the dataset API without increasing my graph size?
source to share
I found a solution to my problem using tf.train.SessionRunHook
. I am creating an object SessionRunHook
that initializes the iterator after the session is created:
class IteratorInitializerHook(tf.train.SessionRunHook):
def __init__(self):
super(IteratorInitializerHook, self).__init__()
self.iterator_initiliser_func = None
def after_create_session(self, session, coord):
self.iterator_initiliser_func(session)
The initialization function is set when the dataset iterator is created:
iterator_initiliser_hook.iterator_initiliser_func = \
lambda sess: sess.run(
iterator.initializer,
feed_dict={images_placeholder: images,
labels_placeholder: labels})
And I am passing hook objects in train_monitors
and eval_hooks
parameters tf.contrib.learn.Experiment
.
The resulting file is graph.pbtxt
now only 500K and the files .meta
are only 244K.
source to share