For in-depth information on various Big Data technologies, check out my free e-book “Introduction to Big Data“.
Large-scale computer clusters are challenging to utilize efficiently. Originally, Hadoop was restricted mainly to the paradigm MapReduce, where the resource management is done by JobTracker and TaskTacker. The JobTracker farms out MapReduce tasks to specific nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack. A TaskTracker is a node in the cluster that accepts tasks – Map, Reduce and Shuffle operations – from a JobTracker. Because Hadoop has stretched beyond MapReudce (e.g. HBase, Storm, etc.), Hadoop now architecturally decouples the resource management features from the programming model of MapReduce, which makes Hadoop clusters more generic. The new resource manager is referred to as MapReduce 2.0 (MRv2) or YARN. Continue reading