HBase and Hadoop

HBase requires Hadoop to be installed based on what I've read so far. And it looks like HBase can be configured to use an existing Hadoop cluster (which is shared with some other users) or can it be configured to use a dedicated Hadoop cluster? I think the latter would be a safer configuration, but I'm wondering if anyone has any experience with the former (but then I'm not very sure if my understanding of the HBase setup is correct or not).


source to share

3 answers

I know that Facebook and other large organizations are decoupling their HBase cluster (real-time access) from their Hadoop cluster (batch analytics) for performance reasons. Large MapReduce jobs on a cluster can impact RTI performance, which can be problematic.

In a smaller organization, or in a situation where your HBase response times do not need to be consistent, you can simply use the same cluster.

There are few (or no) problems with coexistence other than performance issues.



We installed it with an existing Hadoop cluster which has 1000 cores. Short answer: it works great with at least Cloudera CH2 +149.88 . But according to the Hadoop version, your mileage may vary.



In distributed mode, Hadoop is used for HDFS storage. HBase will store the HFile on HDFS and thus benefit from the replication strategies and data localization principles that will be derived from datanodes.

RegionServer are going to mainly handle local data, but may need to fetch data from other datanodes.

Hope it helps you understand why and how haop is used with HBase.



All Articles