Implementing Fault Tolerance in Distributed Message Queues

Question

Implementing Fault Tolerance in Distributed Message Queues

Suppose in the figure below that the intermediate message queue fails. Senders can receive messages sent using other message queues.

But what happens if the message queue dies after receiving a message. How does the sender know if a message has been sent to the recipient or not in order to decide whether to forward or not to another message queue?

Similarly, what happens if the receiver dies after the message queue delivers a message to it? How is the sender supposed to know if his intended request was fulfilled by the receiver or not?

enter image description here

+3

fault-tolerance distributed distributed-system message-queue

user782220 Feb 20 '13 at 3:15

source to share

1 answer

Recurse · Accepted Answer · 2013-02-20T03:43:19+0000

As a starting point, you should read http://en.wikipedia.org/wiki/Two_Generals%27_Problem .

This is an example of a very famous and very common problem in computer science. Technically, this is considered "resolved" because we know the answer; however, a short story: what you are asking for is (strictly speaking) impossible. There are protocols you can develop that will allow you to achieve whatever level of confidence the message was delivered (or not), provided that the trust is <1.0.

In practice, variations of two and three phase distributed transaction protocols are used, as well as different retransmission and resynchronization backups. The specifics are implementation dependent.

Often the choice is to allow duplication and require the Recipient to respond appropriately. This is a choice made by TCP, which, if you think about it, tries to find a reasonable answer to the same question.

Implementing Fault Tolerance in Distributed Message Queues

More articles: