Can't install Ganglia on EMR 4.0.0
I am following this tutorial to install Spark on the latest AMI / EMR cluster:
http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-launch.html
I want to install Ganglia to monitor the cluster, so I added Name=Ganglia
to the list of applications to install:
aws emr create-cluster --name "Spark cluster" --release-label emr-4.0.0 --applications Name=Spark Name=Ganglia --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --use-default-roles
But I am getting the following error:
A client error (ValidationException) occurred while calling the RunJobFlow operation: The specified application: Ganglia is invalid
Here are the versions I'm running:
aws --version
aws-cli/1.7.41 Python/2.7.7 Linux/2.6.32-431.29.2.el6.x86_64
source to share
Ganglia is not part of the EMR 4.0 release.
The official API documentation shows valid values: "Hadoop", "Hive", "Mahout", "Pig" and "Spark". "
It looks like the AWS CLI documentation is wrong.
source to share
If you keep using the bootstrap action you should be fine.
// AWSCLI example
aws emr create-cluster \
--bootstrap-actions file://bootstrap_actions.json \
...
//bootstrap_actions.json
{
"Name": "Install Ganglia",
"Path": "s3://elasticmapreduce/bootstrap-actions/install-ganglia"
},
Or from DataPipeline (example pipeline definition file):
{
"id": "EmrCluster",
"name": "My Cluster (staging)",
"type": "EmrCluster",
"bootstrapAction": [
"s3://elasticmapreduce/bootstrap-actions/install-ganglia"
],
etc..
},
source to share