How do I modify the Tensorflow Sequence2Sequence model to implement bidirectional LSTM rather than unidirectional?
Please refer to this post for the background to the issue: Is the embedding_attention_seq2seq TensorFlow method of the bidirectional RNN encoder used by default?
I am working on the same model and want to replace a unidirectional LSTM layer with a bi-directional layer. I understand that I need to use static_bidirectional_rnn instead of static_rnn, but I am getting an error due to some inconsistency in tensor form.
I replaced the following line:
encoder_outputs, encoder_state = core_rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype)
with the below line:
encoder_outputs, encoder_state_fw, encoder_state_bw = core_rnn.static_bidirectional_rnn(encoder_cell, encoder_cell, encoder_inputs, dtype=dtype)
This gives me the following error:
InvalidArgumentError (see above for tracking): Incompatible Forms: [32,5,1,256] vs. [16,1,1,256] [[Node: gradients / model_with_buckets / embedding_attention_seq2seq / embedding_attention_decoder / attention_decoder / Attention_0 / add_gradgs / BroadcastGradientGradient = DT_INT32, _device = "/ job: localhost / replica: 0 / task: 0 / cpu: 0"] (gradients / model_with_buckets / embedding_attention_seq2seq / embedding_attention_decoder / attention_decoder / Attention_0 / add_grad / Shape, gradients / model _with_buckets_buckets attention_decoder / Attention_0 / add_grad / Shape_1)]]
I understand that the results of both methods are different, but I don't know how to change the attention code to include this. How do I send both the front and back states to the attention module - am I merging both hidden states?
source to share
From the error message I found that the batch size of the two tensors is not the same somewhere, one is 32 and the other is 16. I suppose this is because the output list of the bidirectional rnn is twice the size of the unidirectional one. And you just don't adjust to that in the following code accordingly.
How to send priority and reverse states to a module - combine both hidden states?
You can reference this code:
def _reduce_states(self, fw_st, bw_st):
"""Add to the graph a linear layer to reduce the encoder final FW and BW state into a single initial state for the decoder. This is needed because the encoder is bidirectional but the decoder is not.
Args:
fw_st: LSTMStateTuple with hidden_dim units.
bw_st: LSTMStateTuple with hidden_dim units.
Returns:
state: LSTMStateTuple with hidden_dim units.
"""
hidden_dim = self._hps.hidden_dim
with tf.variable_scope('reduce_final_st'):
# Define weights and biases to reduce the cell and reduce the state
w_reduce_c = tf.get_variable('w_reduce_c', [hidden_dim * 2, hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
w_reduce_h = tf.get_variable('w_reduce_h', [hidden_dim * 2, hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
bias_reduce_c = tf.get_variable('bias_reduce_c', [hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
bias_reduce_h = tf.get_variable('bias_reduce_h', [hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
# Apply linear layer
old_c = tf.concat(axis=1, values=[fw_st.c, bw_st.c]) # Concatenation of fw and bw cell
old_h = tf.concat(axis=1, values=[fw_st.h, bw_st.h]) # Concatenation of fw and bw state
new_c = tf.nn.relu(tf.matmul(old_c, w_reduce_c) + bias_reduce_c) # Get new cell from old cell
new_h = tf.nn.relu(tf.matmul(old_h, w_reduce_h) + bias_reduce_h) # Get new state from old state
return tf.contrib.rnn.LSTMStateTuple(new_c, new_h) # Return new cell and state
source to share