Keras - no broadcast Dot layer?

I am using Keras 2.0.2 with Tensorflow backend. I am trying to make a batch dot product as part of a layer. I'm not really sure how to do this, and none of the things I've seen seem to have the functionality I want.

Specifically, I have two shape layers (None, 2, 50, 5, 3) and (None, 2, 50, 3, 1) and I want to take the point product of dimension "3" and they are passed along dimensions (None, 2, 50) i.e. I need the output (None, 2, 50, 5, 1). My use case is very simple: I am calculating a matrix (5, 3) and a vector (3, 1) at each time slot of the sequence, and I want to take their dot product at each time stamp.

Here's an example showing what I'm running into:

import keras
import keras.backend as K
from keras.layers import Dot, Input

v1 = K.variable(value=np.random.rand(2, 50, 5, 3))
v2 = K.variable(value=np.random.rand(2, 50, 3, 1))
K.batch_dot(v1, v2)  # this works as desired, gives output shape: (2, 50, 5, 1)

x1 = Input((2, 50, 3, 5)) # shape: (None, 2, 50, 3, 5)
x2 = Input((2, 50, 3, 1)) # shape: (None, 2, 50, 3, 1)
Dot(3)([x1, x2]) # output shape is (None, 2, 50, 5, 2, 50, 1)

      

Strange because the code for the Dot layer ( https://github.com/fchollet/keras/blob/master/keras/layers/merge.py ) actually uses K.batch_dot, but the behavior is not the same.

It also contradicts the behavior stated in the docs: "For example, if applied to two tensors a

and a b

shape (batch_size, n)

, the output would be a tensor of the shape (batch_size, 1)

  where each entry i

would be a point product between    a[i]

and b[i]

."

I've tried other things without success, eg. wrapping K.batch_dot in a lambda layer (which can only take one input - shouldn't there be an equivalent general purpose layer taking multiple inputs?) or wrapping a Dot layer in a TimeDistributed Layer (which doesn't seem to work with TimeDistributed can't handle a list as input ).

Any advice would be greatly appreciated!

+3


source to share


1 answer


So, I figured out several ways to do it:

1) If you want to know the matrix parameters that you apply to the vector (which is some function of your input), you can simply apply the TimeDistributed Dense layer with linear activations and no offset.



2) If you want the matrix to be some function of your input (i.e. you want to convert the output from a layer to a matrix and then apply that to some vector that does the function of your data), you can wrap K.batch_dot in the Lambda layer, where the layer takes one list argument as input. that is to say something like: Lambda(lambda x: K.batch_dot(x[0], x[1]))([x1, x2])

. The problem with what I did before was that I didn't have my inputs as a list, and the Lamdba layer couldn't accept more than 1 input.

There is still a documentation issue as well as a use case (2) (IMO). But the above solutions should work.

+3


source







All Articles