Batch_dot with variable batch size in Keras

I am trying to write a layer to combine two tensors with a formula like this

The forms x [0] and x [1] are (?, 1, 500).

M is a 500 * 500 matrix.

I want the output to be (?, 500, 500), which is theoretically possible in my opinion. The layer will output (1500 500) for each pair of inputs (1, 1, 500) and (1, 1, 500). Since batch_size is variable or dynamic, the output should be (?, 500, 500).

However, I don't know much about axes and I've tried all combinations of axes, but it doesn't make sense.

I am trying to use numpy.tensordot and keras.backend.batch_dot (TensorFlow). If the batch_size parameter is fixed, taking a value of a = (100.1 500), for example batch_dot (a, M, (2,0)), the output could be (100.1 500).

Newbie to Keras, sorry for such a stupid question, but I took 2 days to figure it out and it was driving me crazy :(

    def call(self,x):
            input1 = x[0]
            input2 = x[1]
            #self.M is defined in build function
            output = K.batch_dot(...)
            return output

      

Update:

Sorry to be late. I'm trying to answer Daniel with TensorFlow as a Keras backend and it still raises a ValueError for unequal sizes.

I am trying to use the same code with Theano as backend and now it works.

>>> import numpy as np
>>> import keras.backend as K
Using Theano backend.
>>> from keras.layers import Input
>>> x1 = Input(shape=[1,500,])
>>> M = K.variable(np.ones([1,500,500]))
>>> firstMul = K.batch_dot(x1, M, axes=[1,2])

      

I don't know how to print tensor shape in anano. It's definitely harder than tensor for me ... However, it works.

For this I am scanning 2 versions of the codes for Tensorflow and Theano. Below are the differences.

In this case, x = (?, 1, 500), y = (1, 500, 500), axes = [1, 2]

In tensorflow_backend:

return tf.matmul(x, y, adjoint_a=True, adjoint_b=True)

      

In theano_backend:

return T.batched_tensordot(x, y, axes=axes)

      

(If the following changes to out._keras_shape do not affect the value.)

+3


source to share


2 answers


Your multiplications should choose which axes they use in the batch point function.

  • Axis 0 - batch size, it's yours ?

  • Axis 1 is the size you say is long 1

  • Axis 2 - last size, size 500

You won't change the batch size, so batch_dot

always use with axes = [1,2]



But for it to work, you just have to be M (?, 500, 500).
To do this, define M not as (500 500) but as (1500 500) instead and repeat it on the first axis for the batch size:

import keras.backend as K

#Being M with shape (1,500,500), we repeat it.   
BatchM = K.repeat_elements(x=M,rep=batch_size,axis=0)
#Not sure if repeating is really necessary, leaving M as (1,500,500) gives the same output shape at the end, but I haven't checked actual numbers for correctness, I believe it totally ok. 

#Now we can use batch dot properly:
firstMul = K.batch_dot(x[0], BatchM, axes=[1,2]) #will result in (?,500,500)

#we also need to transpose x[1]:
x1T = K.permute_dimensions(x[1],(0,2,1))

#and the second multiplication:
result = K.batch_dot(firstMul, x1T, axes=[1,2])

      

0


source


I prefer using TensorFlow, so I've tried to figure it out with TensorFlow over the past few days.

The first is very similar to Daniel's solution.

x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(None,3,3))
tf.matmul(x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>

      

It must pass values ​​to M with appropriate forms.

sess = tf.Session()
sess.run(tf.matmul(x,M), feed_dict = {x: [[[1,2,3]]], M: [[[1,2,3],[0,1,0],[0,0,1]]]})
# return : array([[[ 1.,  4.,  6.]]], dtype=float32)

      

Another way is simple with tf.einsum

.



x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(3,3))
tf.einsum('ijk,lm->ikl', x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>

      

Let some meanings be fed.

sess.run(tf.einsum('ijk,kl->ijl', x, M), feed_dict = {x: [[[1,2,3]]], M: [[1,2,3],[0,1,0],[0,0,1]]})
# return: array([[[ 1.,  4.,  6.]]], dtype=float32)

      

Now M is a 2D tensor and no need to pass batch_size to M.

What's more, it now seems like this can be solved in TensorFlow with tf.einsum

. Does this mean that Keras is referring to in some situations tf.einsum

? At least I don't find where Keras calls tf.einsum

. And in my opinion when Batch_dot 3D tender and Keras 2D tensor behave strangely. In Daniel's answer, it overlaps M on (1500 500), but in K.batch_dot () M is automatically adjusted to (500 500,1). I find that tf will regulate it with Broadcasting rules and I'm not sure if Keras does the same.

0


source







All Articles