What's the difference between Luong Attention and Bahdanau?

These two attentions are used in seq2seq modules . Two different attentions are introduced as multiplicative and additive attention in this tensor method documentation. What is the difference?

+5


source to share


3 answers


They are very well explained in the pytorch seq2seq tutorial



The main difference is how to evaluate the similarity between the current decoder input and the encoder outputs.

+5


source


I went through this effective approach to attention-based neural machine translation . In section 3.1, they mentioned the difference between the two attentions as follows:

  1. I noticed that the top hidden states of the layers are used both in the encoder and in the decoder. But Bogdanau's attention will be focused on the concatenation of the forward and reverse source of the hidden state (Top Hidden Layer).

  2. In Luong's attention, they get the hidden state of the decoder at time t . Then calculate the attention scores and from this get a context vector that will be connected to the hidden state of the decoder and then predicted.

    But in Bogdanau at time t we will consider the latent state of the decoder at about t-1 . Then we calculate the alignment, the context vectors as above. But then we combine this context with the hidden state of the decoder at time t-1 . Thus, before softmax, this cascade vector falls inside the GRU.

  3. Luong has various types of alignments. Bogdanau only has a consistent bill model.



Alignment methdods

+14


source


Professor Chris Manning explained the two methods in a Standford NLP lecture https://youtu.be/IxQtK2SjWWM?t=2996

0


source







All Articles