Split queue by train / test suite

Question

Split queue by train / test suite

I set up my pipeline starting with a file name queue like in the following pseudocode:

filename_queue = tf.train.string_input_producer(["file0.pd", "file1.pd"])

pointing to TFRecords

containing multiple serialized tf.train.Example

images. Following the tensorflow pointer, a function that reads one example:

def read_my_file_format(filename_queue):
  reader = tf.SomeReader()
  key, record_string = reader.read(filename_queue)
  example, label = tf.some_decoder(record_string)
  processed_example = some_processing(example)
  return processed_example, label

which is used to queue packets:

def input_pipeline(filenames, batch_size):
  filename_queue = tf.train.string_input_producer(filenames)
  example, label = read_my_file_format(filename_queue)

  example_batch, label_batch = tf.train.shuffle_batch(
      [example, label], batch_size=batch_size, capacity=100,
      min_after_dequeue=10)
  return example_batch, label_batch

I am looking for a way to randomly split data into training and test suites. I don't want to save the tutorial and test items in separate files, but images are randomly assigned for training or test suite regardless of the file they are read from. Ideally, I would like to split the input pipeline into test and test queues.

This is what I usually do in numpy when I have to split a huge dataset

import numpy as np
from numpy.random import choice
from numpy.random import RandomState

queue = range(10)
weights = (.8,.2) # create 2 partitions with this weights

def sampler(partition, seed=0):
    rng = RandomState(seed)
    return lambda x: rng.choice(np.arange(len(weights)), p=weights) == partition

def split(queue, weights):
    # filter the queue for each partition
    return [filter(sampler(partition), queue) for partition in range(len(weights)) ]

(train, test) = split(queue, weights)               


print(list(train)) # [0, 1, 2, 3, 4, 5, 6, 9]
print(list(test))  # [7, 8]

+3

python-3.x tensorflow

Manuel schmidt 04 Apr 17 at 17:56

source to share

No one has answered this question yet

See similar questions:

nine

Split tensor into training and test sets

or similar:

2030

How do you split a list into uniformly sized chunks?

484

How do I split a string into a list?

ten

Join queue to numpy array in tensorflow to fetch data instead of files?

2

How do I make caffe read all training data?

1

Split tf dataset randomly into train and test dataset

1

Python, Sklearn: How to cancel train_test_split from Sklearn?

1

Using make_template () in TensorFlow

1

Is there an easier way to handle batch inputs from tfrecords?

0

Dataset API, tf.contrib.data.rejection_resample resource exhausted (too many input files)

0

Error after importing layer weights from CSV into Keras model

Split queue by train / test suite

More articles: