Hive runs in local mode, taking in excessive disk space / tmp

I am running a complex query in a hive that, when started, starts using a huge amount of local disk space in the / tmp folder and eventually ends up with a whitespace error as the / tmp folder is completely filled with an intermediate map, reduce the results due to the mentioned query (folder / tmp is created on a separate partition with 100 GB of free space). While working, he says:

Execution completed successfully

MapredLocal task succeeded

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there no reduce operator

Job running in-process (local Hadoop)

As you can see above, Hive works in local mode somehow. After doing some research on the net, I checked several relevant parameters and below are the results:

hive> set hive.exec.mode.local.auto;

hive.exec.mode.local.auto=false

hive> set mapred.job.tracker;

mapred.job.tracker=local

hive> set mapred.local.dir;

mapred.local.dir=/tmp/hadoop-hive/mapred/local

So, I have two questions regarding this:

  • Could this be the reason that the jobs that shrink the map are consuming space on the local disk instead of the hdfs / tmp folder, as is usually the case with pig scripts?
  • How do I get Hive to work in a distributed fashion given the current settings? Keep in mind that I am using MRV2 in a cluster, but the options above are confusing as they seem to be relevant for MRV1. I could be wrong here as a newbie.

Any help would be greatly appreciated!

+3


source to share


1 answer


It turns out I was out on the bare thing. After installing HADOOP_MAPRED_HOME in / usr / lib / hadoop -mapreduce, all problems were fixed in all nodes.



0


source







All Articles