HBase and Hadoop

Question

HBase and Hadoop

HBase requires Hadoop to be installed based on what I've read so far. And it looks like HBase can be configured to use an existing Hadoop cluster (which is shared with some other users) or can it be configured to use a dedicated Hadoop cluster? I think the latter would be a safer configuration, but I'm wondering if anyone has any experience with the former (but then I'm not very sure if my understanding of the HBase setup is correct or not).

+1

hbase hadoop

kee Mar 30 12 at 1:31

source to share

3 answers

We installed it with an existing Hadoop cluster which has 1000 cores. Short answer: it works great with at least Cloudera CH2 +149.88 . But according to the Hadoop version, your mileage may vary.

0

MrGomez Mar 30 12 at 1:39

source to share

In distributed mode, Hadoop is used for HDFS storage. HBase will store the HFile on HDFS and thus benefit from the replication strategies and data localization principles that will be derived from datanodes.

RegionServer are going to mainly handle local data, but may need to fetch data from other datanodes.

Hope it helps you understand why and how haop is used with HBase.

-1

Adrien M. Apr 14. '12 at 9:18

source to share

Donald miner · Accepted Answer · 2012-03-30T02:49:09+0000

I know that Facebook and other large organizations are decoupling their HBase cluster (real-time access) from their Hadoop cluster (batch analytics) for performance reasons. Large MapReduce jobs on a cluster can impact RTI performance, which can be problematic.

In a smaller organization, or in a situation where your HBase response times do not need to be consistent, you can simply use the same cluster.

There are few (or no) problems with coexistence other than performance issues.

HBase and Hadoop

More articles: