Whether or not TingsorFlow's embedding_attention_seq2seq method implements the default bidirectional RNN encoder?

I used the embedding_attention_seq2seq module for a machine translation task as described in the tutorials referenced in:

https://www.tensorflow.org/versions/master/tutorials/seq2seq/index.html

In seq2seq_model.py

that model in the tutorial, I noticed that they were using GRUCell by default, if use_lstm

set false

in these lines:

# Create the internal multi-layer cell for our RNN.
single_cell = tf.nn.rnn_cell.GRUCell(size)
if use_lstm:
  single_cell = tf.nn.rnn_cell.BasicLSTMCell(size)
cell = single_cell
if num_layers > 1:
  cell = tf.nn.rnn_cell.MultiRNNCell([single_cell] * num_layers)

      

Now, the attention mechanism described in the article here points out the tutorial, as the implemented model makes much more meaningful sense if the encoder is bidirectional and the contextualization takes into account as parameters of the hidden layer. The seq2seq_model file has no mention of a bidirectional component.

So my question is, does embedding_attention_seq2seq implement a bi-directional RNN encoder by default?

If not, is it just doing the hidden level output of each time step of a regular LSTM Encoder, thereby limiting the context to only the words in the sentence that happened before it?

+2


source to share


1 answer


So my question is, does embedding_attention_seq2seq implement a bi-directional RNN encoder by default?

No, it does not implement a bi-directional RNN encoder. The encoder output (which is used to plot attention states) is plotted inside the first few lines embedding_attention_seq2seq

:

# Encoder.
encoder_cell = rnn_cell.EmbeddingWrapper(
    cell, embedding_classes=num_encoder_symbols,
    embedding_size=embedding_size)
encoder_outputs, encoder_state = rnn.rnn(
    encoder_cell, encoder_inputs, dtype=dtype)

      



The first line wraps the cell in a nest. The second encoder_cell

moves forward through encoder_inputs

(lines 210-228 of tf/python/ops/rnn.py

).

If not, is it just doing the hidden level output of each time step of a regular LSTM Encoder, thereby limiting the context to only the words in the sentence that happened before it?

It is right.

+3


source







All Articles