Creating `input_fn` from an iterator
Most of the tutorials focus on the case where the entire set of teaching materials fits into memory. However, I have an iterator that acts like an endless stream of (heck, shortcuts) -corrections (creating them cheaply on the fly).
When implementing input_fn
for tensorflows estimator, I can return an instance from the iterator as
def input_fn():
(feature_batch, label_batch) = next(it)
return tf.constant(feature_batch), tf.constant(label_batch)
or input_fn
should it return the same (functions, labels) -tracks for every call?
Also, this function is called multiple during training as I hope it looks like the following pseudocode:
for i in range(max_iter):
learn_op(input_fn())
source to share
The argument is input_fn
used during training, but the function itself is called once. Therefore, creating a complex input_fn
, out-of-scope constant array returned as described in the tutorial is not that easy.
Tensorflow offers two examples of this non-trivial input_fn
for numpy and panda , but they start with an in-memory array, so this won't help you with your problem.
You can also take a look at their code by following the links above to see how they implement an efficient non-trivial one input_fn
, but you may find that it requires more code that you would like.
If you want to use a lower level Tensorflow interface, things will be simpler and more flexible. There is a tutorial that covers most of the needs and the suggested solutions are simple (-er) to implement.
In particular, if you already have an iterator that returns data, as you described in your question, using placeholders (the Feed section in the previous link) should be straightforward.
source to share
I found a pull request that converts generator
to input_fn
:
https://github.com/tensorflow/tensorflow/pull/7045/files
Relevant part
def _generator_input_fn():
"""generator input function."""
queue = feeding_functions.enqueue_data(
x,
queue_capacity,
shuffle=shuffle,
num_threads=num_threads,
enqueue_size=batch_size,
num_epochs=num_epochs)
features = (queue.dequeue_many(batch_size) if num_epochs is None
else queue.dequeue_up_to(batch_size))
if not isinstance(features, list):
features = [features]
features = dict(zip(input_keys, features))
if target_key is not None:
if len(target_key) > 1:
target = {key: features.pop(key) for key in target_key}
else:
target = features.pop(target_key[0])
return features, target
return features
return _generator_input_fn
source to share