Actions not working Amazon Elastic MapReduce Bootstrap

Question

Actions not working Amazon Elastic MapReduce Bootstrap

I've tried the following combinations of bootstrap actions to increase the heap size of my work, but none of them work:

--mapred-key-value mapred.child.java.opts=-Xmx1024m 
--mapred-key-value mapred.child.ulimit=unlimited

--mapred-key-value mapred.map.child.java.opts=-Xmx1024m 
--mapred-key-value mapred.map.child.ulimit=unlimited

-m mapred.map.child.java.opts=-Xmx1024m
-m mapred.map.child.ulimit=unlimited 

-m mapred.child.java.opts=-Xmx1024m 
-m mapred.child.ulimit=unlimited

What is the correct syntax?

+3

amazon-web-services elastic-map-reduce mapreduce hadoop amazon-emr

Shrish bajpai 05 Apr 12 at 7:38

source to share

2 answers

Steffen opel · Answer 1 · 2012-04-05T08:01:49+0000

You have two options:

Custom JVM settings

To apply custom settings, you might want to review the Bootstrap Actions documentation for Amazon Elastic MapReduce (Amazon EMR) , specifically the Configure Daemons action :

This predefined bootstrap action allows you to specify heap size or other Java virtual machine (JVM) settings for the Hadoop daemons. You can use this bootstrap action to tune Hadoop for large jobs that require more memory than Hadoop allocates by default. You can also use this bootstrap action to change advanced JVM options such as garbage collection behavior.

Here is an example that sets the heap size to 2048 and configures the Java namenode parameter:

$ ./elastic-mapreduce –create –alive \
  --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-daemons \
  --args --namenode-heap-size=2048,--namenode-opts=-XX:GCTimeRatio=19

Predefined JVM Settings

Alternatively, according to the FAQ How do I configure Hadoop settings for my workflow? If your task flow tasks are memory intensive, you can use fewer tasks per core and reduce the size of the task heap. In this situation, a predefined Bootstrap Action is available to customize the job flow at startup - this refers to the Configure Memory-Intensive Workloads action , which allows you to set the cluster - Hadoop general settings for values suitable for workload-intensive job flows, for example:

$ ./elastic-mapreduce --create \
--bootstrap-action \
  s3://elasticmapreduce/bootstrap-actions/configurations/latest/memory-intensive

The specific configuration parameters applied by this predefined bootstrap action are listed in Hadoop Memory-Intensive Configuration Settings .

Good luck!

Kei-ven · Answer 2 · 2014-09-13T01:45:00+0000

Steffen's answer is good and works. On the other hand, if you just want something quick and dirty and want to replace one or two variables, then you probably want to just change it using the command line, like below:

elastic-mapreduce --create \
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \
  --args "-m,mapred.child.java.opts=-Xmx999m"

I've seen some other documentation, albeit older, that just quotes the entire expression in one quote, like this:

--bootstrap-action "s3://elasticmapreduce/bootstrap-actions/configure-hadoop -m \
    mapred.child.java.opts=-Xmx999m"    ### I tried this style, it no longer works!

This is not easy to find in the AWS EMR documentation anyway . I suspect mapred.child.java.opts is one of the most overridden variables out there. I was also looking for an answer when I got the GC error: "java.lang.OutOfMemoryError: GC high limit exceeded" and came across this page, The default 200m is too small ( default documentation ).

Good luck!

Actions not working Amazon Elastic MapReduce Bootstrap

Custom JVM settings

Predefined JVM Settings

More articles: