Application manager in YARN setup

366 Views Asked by Aneri At 01 February 2017 at 04:26

I have setup of 1 Name Node, 2 Data Nodes, 1 Resource Manager and 2 Node Managers.All components are running as docker containers. Every time when I execute a spark submit (yarn cluster mode) from 2 machines (2 clients), job gets completed in a sequential manner. Job1 and Job2 both goes in Accepted state, Job1 turns to Running and Finished state and then Job2 gets picked and finishes its execution. Is there any way these jobs gets executed in parallel fashion? How does Application manager picks these tasks to give it to node manager?

Original Q&A

There are 1 best solutions below

franklinsijo On 01 February 2017 at 19:20 BEST ANSWER

The cluster setup is using YARN Capacity Scheduler, which is default in most of the available Hadoop distributions. If multiple jobs are submitted by the same user, they enter the same user queue which follows FIFO. This is the default behaviour of capacity scheduler.

Fair Scheduler can be configured to run jobs in parallel by sharing the available resources.

Add this property to yarn-site.xml

<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

Configure the fair scheduler queues in an allocation file,

<property>
      <name>yarn.scheduler.fair.allocation.file</name>
      <value>/path/to/allocation-file.xml</value>
</property>

If this property is not configured, a queue per user will be created by default.

Application manager in YARN setup

There are 1 best solutions below

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in DOCKER

Related Questions in HADOOP-YARN

Related Questions in APPLICATIONMANAGER

Trending Questions

Popular # Hahtags

Popular Questions