What's the difference between Luong Attention and Bahdanau?

Question

What's the difference between Luong Attention and Bahdanau?

These two attentions are used in seq2seq modules . Two different attentions are introduced as multiplicative and additive attention in this tensor method documentation. What is the difference?

+5

deep-learning nlp tensorflow

Shamane siriwardhana May 29 '17 at 8:43

source to share

3 answers

I went through this effective approach to attention-based neural machine translation . In section 3.1, they mentioned the difference between the two attentions as follows:

I noticed that the top hidden states of the layers are used both in the encoder and in the decoder. But Bogdanau's attention will be focused on the concatenation of the forward and reverse source of the hidden state (Top Hidden Layer).
In Luong's attention, they get the hidden state of the decoder at time t . Then calculate the attention scores and from this get a context vector that will be connected to the hidden state of the decoder and then predicted.

But in Bogdanau at time t we will consider the latent state of the decoder at about t-1 . Then we calculate the alignment, the context vectors as above. But then we combine this context with the hidden state of the decoder at time t-1 . Thus, before softmax, this cascade vector falls inside the GRU.
Luong has various types of alignments. Bogdanau only has a consistent bill model.

+14

Shamane siriwardhana 09 June '17 at 9:31

source to share

Professor Chris Manning explained the two methods in a Standford NLP lecture https://youtu.be/IxQtK2SjWWM?t=2996

0

user36936 23 jan. At 10:15

source to share

J-min · Accepted Answer · 2017-05-29T10:04:16+0000

They are very well explained in the pytorch seq2seq tutorial

The main difference is how to evaluate the similarity between the current decoder input and the encoder outputs.

What's the difference between Luong Attention and Bahdanau?

More articles: