Hive runs in local mode, taking in excessive disk space / tmp

Question

Hive runs in local mode, taking in excessive disk space / tmp

I am running a complex query in a hive that, when started, starts using a huge amount of local disk space in the / tmp folder and eventually ends up with a whitespace error as the / tmp folder is completely filled with an intermediate map, reduce the results due to the mentioned query (folder / tmp is created on a separate partition with 100 GB of free space). While working, he says:

Execution completed successfully

MapredLocal task succeeded

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there no reduce operator

Job running in-process (local Hadoop)

As you can see above, Hive works in local mode somehow. After doing some research on the net, I checked several relevant parameters and below are the results:

hive> set hive.exec.mode.local.auto;

hive.exec.mode.local.auto=false

hive> set mapred.job.tracker;

mapred.job.tracker=local

hive> set mapred.local.dir;

mapred.local.dir=/tmp/hadoop-hive/mapred/local

So, I have two questions regarding this:

Could this be the reason that the jobs that shrink the map are consuming space on the local disk instead of the hdfs / tmp folder, as is usually the case with pig scripts?
How do I get Hive to work in a distributed fashion given the current settings? Keep in mind that I am using MRV2 in a cluster, but the options above are confusing as they seem to be relevant for MRV1. I could be wrong here as a newbie.

Any help would be greatly appreciated!

+3

hadoop hive mrv2 cloudera-cdh

user5092078 02 Aug 15 at 19:32

source to share

1 answer

user5092078 · Answer 1 · 2015-08-06T19:27:39+0000

It turns out I was out on the bare thing. After installing HADOOP_MAPRED_HOME in / usr / lib / hadoop -mapreduce, all problems were fixed in all nodes.

Hive runs in local mode, taking in excessive disk space / tmp

More articles: