Hive runs in local mode, taking in excessive disk space / tmp
I am running a complex query in a hive that, when started, starts using a huge amount of local disk space in the / tmp folder and eventually ends up with a whitespace error as the / tmp folder is completely filled with an intermediate map, reduce the results due to the mentioned query (folder / tmp is created on a separate partition with 100 GB of free space). While working, he says:
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there no reduce operator
Job running in-process (local Hadoop)
As you can see above, Hive works in local mode somehow. After doing some research on the net, I checked several relevant parameters and below are the results:
hive> set hive.exec.mode.local.auto;
hive.exec.mode.local.auto=false
hive> set mapred.job.tracker;
mapred.job.tracker=local
hive> set mapred.local.dir;
mapred.local.dir=/tmp/hadoop-hive/mapred/local
So, I have two questions regarding this:
- Could this be the reason that the jobs that shrink the map are consuming space on the local disk instead of the hdfs / tmp folder, as is usually the case with pig scripts?
- How do I get Hive to work in a distributed fashion given the current settings? Keep in mind that I am using MRV2 in a cluster, but the options above are confusing as they seem to be relevant for MRV1. I could be wrong here as a newbie.
Any help would be greatly appreciated!
source to share