Tensorflow py_func is handy, but my training step is very slow.

Question

Tensorflow py_func is handy, but my training step is very slow.

I have a performance issue using the tensorflow py_func function.

Context

In my project I have a input_features

size tensor batch [? max_items m]

. The first dimension matters ?

because it is a dynamic shape (the batch is read for a custom tensor reader and moved with tf.train.shuffle_batch_join ()). The second dimension corresponds to the upper bound (the maximum number of elements I can take for my example), the third dimension corresponds to the function dimension space. I also have a tensor num_items

whose size is equal to the batch size (hence the shape (?,)

) by indicating the number of elements in the example, others are set to 0 (in numpy writing style input_feature[k, num_items[k]:, :] = 0

)

Question

My workflow needs some custom python operations (especially to work with indexing, I need or an instance to do clustering operations in some examples) and I am using multiple numpy functions wrapped in a function py_func

. This works well, but learning becomes very slow (about 50 times slower than the model without this py_func) and the function itself doesn't take long.

Questions

1 - Is this calculation time normal? The function enclosed in py_func

gives me a new tensor, which will be further multiplied. Does this explain the computation time? (I mean the gradient can be harder to compute with a function like this).

2 - I am trying to change my handling and not use a function py_func

. However, it was very handy for fetching data with numpy indexing (especially with data formatting), and I have some difficulty to transfer it in TF mode. For example, if I have a tensor t1

with a shape [-1, n_max, m]

(the first dimension is batch_size, which is dynamic) and t2

with a shape [-1,2]

that contains integers. Is there an easy way to do an average operation on a tensorflow that would result in t_mean_chunk

with a shape (-1, m)

where (in numpy form) t_mean_chunk[i,:] = np.mean(t1[i, t2[i,0]:t2[i,1], :], axis=0)

:? This was (among other operations) what I was doing in the wrapped function.

+3

python numpy indexing machine-learning tensorflow

Sstrap 21 Mar '17 at 13:00

source to share

1 answer

Allen lavoie · Accepted Answer · 2017-03-22T19:01:06+0000

Question 1 is hard to answer without the exact py_func, but as hpaulj mentioned in his comment, it's no surprise that it slows things down. Worst-case returns tf.scan

or tf.while_loop

s TensorArray

may be slightly faster. However, the best case is to have a vectorized solution with TensorFlow operations, which I think is possible in this case.

Regarding question 2, I'm not sure if it counts as simple, but here's a function that evaluates your indexing expression:

import tensorflow as tf

def range_mean(index_ranges, values):
  """Take the mean of `values` along ranges specified by `index_ranges`.

  return[i, ...] = tf.reduce_mean(
    values[i, index_ranges[i, 0]:index_ranges[i, 1], ...], axis=0)

  Args:
    index_ranges: An integer Tensor with shape [N x 2]
    values: A Tensor with shape [N x M x ...].
  Returns:
    A Tensor with shape [N x ...] containing the means of `values` having
    indices in the ranges specified.
  """
  m_indices = tf.range(tf.shape(values)[1])[None]
  # Determine which parts of `values` will be in the result
  selected = tf.logical_and(tf.greater_equal(m_indices, index_ranges[:, :1]),
                            tf.less(m_indices, index_ranges[:, 1:]))
  n_indices = tf.tile(tf.range(tf.shape(values)[0])[..., None],
                      [1, tf.shape(values)[1]])
  segments = tf.where(selected, n_indices + 1, tf.zeros_like(n_indices))
  # Throw out segment 0, since that our "not included" segment
  segment_sums = tf.unsorted_segment_sum(
      data=values,
      segment_ids=segments, 
      num_segments=tf.shape(values)[0] + 1)[1:]
  divisor = tf.cast(index_ranges[:, 1] - index_ranges[:, 0],
                    dtype=values.dtype)
  # Pad the shape of `divisor` so that it broadcasts against `segment_sums`.
  divisor_shape_padded = tf.reshape(
      divisor,
      tf.concat([tf.shape(divisor), 
                 tf.ones([tf.rank(values) - 2], dtype=tf.int32)], axis=0))
  return segment_sums / divisor_shape_padded

Usage example:

index_range_tensor = tf.constant([[2, 4], [1, 6], [0, 3], [0, 9]])
values_tensor = tf.reshape(tf.range(4 * 10 * 5, dtype=tf.float32), [4, 10, 5])
with tf.Session():
  tf_result = range_mean(index_range_tensor, values_tensor).eval()
  index_range_np = index_range_tensor.eval()
  values_np = values_tensor.eval()

for i in range(values_np.shape[0]):
  print("Slice {}: ".format(i),
        tf_result[i],
        numpy.mean(values_np[i, index_range_np[i, 0]:index_range_np[i, 1], :],
                   axis=0))

Printing

Slice 0:  [ 12.5  13.5  14.5  15.5  16.5] [ 12.5  13.5  14.5  15.5  16.5]
Slice 1:  [ 65.  66.  67.  68.  69.] [ 65.  66.  67.  68.  69.]
Slice 2:  [ 105.  106.  107.  108.  109.] [ 105.  106.  107.  108.  109.]
Slice 3:  [ 170.  171.  172.  173.  174.] [ 170.  171.  172.  173.  174.]

Tensorflow py_func is handy, but my training step is very slow.

More articles: