Can't install Ganglia on EMR 4.0.0

I am following this tutorial to install Spark on the latest AMI / EMR cluster:

http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-launch.html

I want to install Ganglia to monitor the cluster, so I added Name=Ganglia

to the list of applications to install:

aws emr create-cluster --name "Spark cluster" --release-label emr-4.0.0 --applications Name=Spark Name=Ganglia --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --use-default-roles

      

But I am getting the following error:

A client error (ValidationException) occurred while calling the RunJobFlow operation: The specified application: Ganglia is invalid

Here are the versions I'm running:

aws --version
aws-cli/1.7.41 Python/2.7.7 Linux/2.6.32-431.29.2.el6.x86_64

      

+3


source to share


4 answers


Ganglia is not part of the EMR 4.0 release.

The official API documentation shows valid values: "Hadoop", "Hive", "Mahout", "Pig" and "Spark". "



It looks like the AWS CLI documentation is wrong.

+2


source


The usual Ganglia bootstrap action also fails in emr-4.0.0. See this issue .



+1


source


Amazon is probably working on an official Ganglia release for EMR 4.x. Until it exits, you can use this bootstrap action:

s3://support.elasticmapreduce/release/4.x/ganglia/install_ganglia_emr-4.0.0.rb

      

0


source


If you keep using the bootstrap action you should be fine.

// AWSCLI example

aws emr create-cluster                  \
  --bootstrap-actions file://bootstrap_actions.json \
  ...

      

//bootstrap_actions.json

{
    "Name": "Install Ganglia",
    "Path": "s3://elasticmapreduce/bootstrap-actions/install-ganglia"
  },

      

Or from DataPipeline (example pipeline definition file):

   {
      "id": "EmrCluster",
      "name": "My Cluster (staging)",
      "type": "EmrCluster",
      "bootstrapAction": [
        "s3://elasticmapreduce/bootstrap-actions/install-ganglia"
      ],
      etc..
    },

      

-1


source







All Articles