Where zookeeper and Kafka fit into odoop 2.6 cluster

Hadoop 2.6 uses Yarn as the next generation card and is also a cluster manager. Do we need to use zookeeper with hadoop 2.6 for cluster management services? How to set up zookeeper.

How Kafka is connected for a haop cluster. What would be the consumer and producer for kafka to send data to hadoop filesystem.

Where they all fit in.

I have installed a single node hasoop 2.6 cluster. Now in the following way as I understand it is to have zookeeper and Kafka to stream data to hadoop filesystem. And I dont know how to use kafka for howop or its api.

+3


source to share


2 answers


Zookeeper is a coordination platform for distributed systems. Zookeeper is used for state coordination in HDFS and yarn high availability, coordination between Hbase servers and regions, etc. Kafka works in conjunction with Apache Storm, Apache HBase, and Apache Spark to provide real-time analysis and streaming data. Common use cases:

  • Stream processing.
  • Tracking site activity
  • Collection and monitoring of indicators
  • Aggregating logs


We usually use Kafka along with Storm. A zookeeper cluster is needed to coordinate between the halo and the Storm supervisor. Kafka needs a zookeeper to store information about cluster status and consumer offsets.

Basically zookeeper provides a highly available file system where users / application can read / write small data. This data can be associated with a message or transactions. Since the file system is highly available, communication will always complete and will not be transferred to a partial or unknown state. A Zookeeper cluster can withstand up to a certain number of failures depending on the number of partitions (say N), it can carry N-1 errors. For more details you can refer to the following urls 1 2 3

+3


source


Kafka is working on a producer / consumer concept where producers write on a topic and consumers consume data from a topic. Each consumer can consume data from any available section for that topic.

Theme consumers also log into ZooKeeper to coordinate with each other and balance data consumption.



Consumers keep track of the maximum offset they consume in each section. This value is stored in the ZooKeeper directory if offsets.storage = zookeeper. This score is stored in the ZooKeeper directory. / customers / [group_id] / offsets / [topic] / [broker_id-partition_id] → offset_counter_value ((persistent node). Consult the kafka documentation for more information on using zookeeper in Kafka

+1


source







All Articles