Does hive remove hasoop on request?

I am trying to understand how the beehive and howopa interact. From the tutorials I have read, I found that before running HIV virus virus queries, you run a map / prune job to get the input. It seems counterproductive to me, if I have already done the map / pruning job and got the data in an easily parsed format, why not put the data in a traditional database.

Thanks for your help Nathan

+2


source to share


3 answers


Hive works with files stored on HDFS. For anything other than basic queries, the hive generates and runs mapreduce jobs. For very simple ( SELECT * FROM MyTable

) queries, it will just flush the files from disk.



The input does not have to come from MapReduce - it can be a simple text file loaded into HDFS. See http://developer.yahoo.com/hadoop/tutorial/module2.html#commandref

+4


source


Hive fills a very important void in open source software by providing the functionality of a massive parabolic processing database. In other worlds, it gives us a scale-out SQL analytic engine.
Specifically to your question, I see a few main scenarios where Hive is better than RDMS.
a) The data is already in HDFS and we have other uses for it (eg MR jobs)
b) Too much data to load into one RDMBS server.
c) We only need to request data once or twice. In this case, Hive can outperform RDMBS with relatively slow data load times.



+1


source


Yes. Hive is built on top of Hadoop, which has distributed computing. Hive uses HDFS for file storage. Each table is stored as a file in HDFS.

0


source







All Articles