How to achieve distributed processing and high availability at the same time in Kafka?

Question

How to achieve distributed processing and high availability at the same time in Kafka?

I have a topic with n sections. For distributed processing to create two processes on different machines. They subscribe to a topic with the same groupd identifier and allocate n / 2 threads, each processing one thread (n / 2 sections per process).

With this, I would achieve load balancing, but now if process 1 crashes, then process 2 cannot consume messages from the partitions allocated to process 1, since it only listened on n / 2 threads at the beginning.

Or if I tune in to HA and start n threads / threads in both processes, then when one node fails, all sections will be processed by the other node. But here we have compromised the distribution as all partitions will be handled by one node at a time.

Is there a way to achieve at the same time and how?

+3

scalability message-queue apache-kafka kafka-consumer-api high-availability

Sumit jain 05 May '15 at 18:10

source to share

1 answer

nelsonda · Answer 1 · 2015-05-06T12:40:25+0000

Yes, use the existing threading engine. Storm is a good choice, while Spark and Samza depend on your use case.

You can now roll your own, but as you found out, managing failed processes and high availability are tricky. Generally speaking, distributed processing is filled with many subtle problems that someone else has already solved . In your shoes, I would use existing software to solve this problem.

How to achieve distributed processing and high availability at the same time in Kafka?

More articles: