This is definitely a real problem - addressing two issues simultaneously, the scalability of the current Hadoop/MapReduce architecture along with the sharing of resources between MapReduce and other frameworks.
The solutions main idea is to have a central ResourceManager that controls global state, plus a per-application/job ApplicationMaster that tracks all of the state for that job. This allows different ApplicationMasters to run different frameworks, and reduces the load on the ResourceManager, since it has to manage a small amount of global state and the ApplicationMasters deal with more complex scheduler, load balancing, fault tolerance, etc.
This solution is different from previous work for two reasons. First, workloads and clusters have become larger than originally envisioned when Hadoop/MapReduce was architected, and clusters had reached their limitations on the original architecture. Second, more frameworks want to be run on the same cluster as more applications move to cluster-based processing, so MapReduce needs to play more nicely with and Hadoop needs to provide a better execution engine for these new frameworks.
There are definitely some fundamental trade-offs here. Introducing per-job ApplicationMasters increases overhead a little because there is more communication and latency between the RM -> AM. However, this reduces load on the central RM, providing better overall scaling. Another trade-off is the flexibility of YARN vs the features it provides; they note that creating applications to run on YARN is not really an easy task because you have to be concerned with all of the intricacies of load balancing, fault tolerance, etc., but this low level interface allows for very general models to run on YARN. The compromise is other solutions such as REEF that enable you to more easily build applications on YARN, taking care of some of these issues for you.
I think this will likely be influential in 10 years, in part just because I think the Hadoop ecosystem will still be around in 10 years and even though YARN may no longer be the resource manager of choice by then, it will certainly influence whatever succeeds it.
Check it once through Hadoop admin Online Training Bangalore
ReplyDeletefor more info.