Capacity scheduler in Amazon Elastic MapReduce

514 Views Asked by jyxlcd At 28 August 2014 at 13:49

I am totally new to Amazon Elastic MapReduce. I have a need that I want to use my custom scheduler, which is implemented based on Hadoop capacity scheduler, to schedule my jobs in Amazon Elastic MapReduce.

According to my current understanding, to achieve this, I can define only one stage in the job flow, and submit my custom jar file via SSH connection to the master node. However, I cannot find how can I edit the xml configuration files, like capacity-scheduler.xml in the master node. Anyone knows how to do that?

Moreover, if I want to add the dynamic sizing property onto it, can I dynamically tune the number of task nodes in the cluster, when the job is currently running? Or in per stage, the size of a cluster should remain the same? Thank you so much.

Original Q&A

There are 1 best solutions below

user1452132 On 28 August 2014 at 15:33

You should use a bootstrap action to change Hadoop configuration.

The following AWS doc can be referenced for Hadoop configuratio bootstrap action.
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-bootstrap.html#PredefinedbootstrapActions_ConfigureHadoop

This blog article that I bookmarked also has some info. http://sujee.net/tech/articles/hadoop/amazon-emr-beyond-basics/

For changing the cluster size dynamically, one option is to use the AWS SDK.
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/calling-emr-with-java-sdk.html

Using the following interface you can modify the instance count of the instance group. http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/AmazonElasticMapReduce.html

Capacity scheduler in Amazon Elastic MapReduce

There are 1 best solutions below

Related Questions in HADOOP

Related Questions in SCHEDULER

Related Questions in AMAZON-EMR

Related Questions in DYNAMIC-SIZING

Trending Questions

Popular # Hahtags

Popular Questions