For in-depth information on various Big Data technologies, check out my free e-book “Introduction to Big Data“.

Large-scale computer clusters are challenging to utilize efficiently. Originally, Hadoop was restricted mainly to the paradigm MapReduce, where the resource management is done by JobTracker and TaskTacker. The JobTracker farms out MapReduce tasks to specific nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack. A TaskTracker is a node in the cluster that accepts tasks – Map, Reduce and Shuffle operations – from a JobTracker. Because Hadoop has stretched beyond MapReudce (e.g. HBase, Storm, etc.), Hadoop now architecturally decouples the resource management features from the programming model of MapReduce, which makes Hadoop clusters more generic. The new resource manager is referred to as MapReduce 2.0 (MRv2) or YARN. Now MapReduce is a kind of application running in a YARN container and other types of applications can be written generically to run on YARN. YARN employs a master-slave model and includes several components:

  • The global Resource Manager is the ultimate authority that arbitrates resources among all applications in the system.
  • The per-application Application Master negotiates resources from the Resource Manager and works with the Node Managers to execute and monitor the component tasks.
  • The per-node slave Node Manager is responsible for launching the applications’ containers, monitoring their resource usage and reporting to the Resource Manager.

The Resource Manager, consisting of Scheduler and Application Manager, is the central authority that arbitrates resources among various competing applications in the cluster. The Scheduler is responsible for allocating resources to the various running applications subject to the constraints of capacities, queues etc. The Application Manager is responsible for accepting job-submissions, negotiating the first container for executing the application specific Application Master and provides the service for restarting the Application Master container on failure.

The Scheduler uses the abstract notion of a Resource Container which incorporates elements such as memory, cpu, disk, network etc. Initially, YARN uses the memory-based scheduling. Each node is configured with a set amount of memory and applications request containers for their tasks with configurable amounts of memory. Recently, YARN added CPU as a resource in the same manner. Nodes are configured with a number of “virtual cores” (vcores) and applications give a vcore number in the container request.

The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc. For example, the Capacity Scheduler is designed to maximize the throughput and the utilization of shared, multi-tenant clusters. Queues are the primary abstraction in the Capacity Scheduler. The capacity of each queue specifies the percentage of cluster resources that are available for applications submitted to the queue. Furthermore, queues can be set up in a hierarchy. YARN also sports a Fair Scheduler that tries to assign resources to applications such that all applications get an equal share of resources over time on average using dominant resource fairness.

The protocol between YARN and applications is as follows. First an Application Submission Client communicates with the Resource Manager to acquire a new Application Id. Then it submit the Application to be run by providing sufficient information (e.g. the local files/jars, command line, environment settings, etc.) to the Resource Manager to launch the Application Master. The Application Master is then expected to register itself with the Resource Manager and request for and receive containers. After a container is allocated to it, the Application Master communicates with the Node Manager to launch the container for its task by specifying the launch information such as command line specification, environment, etc. The Application Master also handles failures of job containers. Once the task is completed, the Application Master signals the Resource Manager.

As the central authority of the YARN cluster, the Resource Manager is also the single point of failure (SPOF). To make it fault tolerant, an Active/Standby architecture can be employed since Hadoop 2.4. Multiple Resource Manager instances (listed in the configuration file yarn-site.xml) can be brought up but only one instance is Active at any point of time while others are in Standby mode. When the Active goes down or becomes unresponsive, another Resource Manager is automatically elected by a Zookeeper-based method to be the Active. Clients, Application Masters and Node Managers try connecting to the Resource Managers in a round-robin fashion until they hit the new Active.