Tuesday, September 29, 2015

Review of "Dominant Resource Fairness: Fair Allocation of Heterogeneous Resources in Datacenters"

This is definitely a real problem - there are many situations where many users need to share a large amount of cluster resources, and often the resources required by different tasks are extremely different. Scheduling them efficiently can provide large performance gains.

The main idea is to first consider a user's dominant resource, the one that requires the largest percentage of the cluster's resources (e.g. CPU might be a dominant resource if the user requested an amount of CPU that was a larger fraction of the cluster's CPU resources than the percentage amount of RAM it requested). To allocate by DRF, simply continue to allocate one task to the user with minimum dominant share (its share of its dominant resource). 

This solution is different from previous work because the problem of large-scale cluster scheduling is relatively new, and becoming more of an issue only in more recent times. 

The paper does identify a hard trade-off in DRF (and, they discuss, generally any scheduling algorithm) between resource monotonicity and the share guarantee / Pareto efficiency, giving an example of why both can't be satisfied simultaneously. But, they discuss that this is a trade-off they make very willingly, since new resource addition is a very infrequent event. 

I don't know if I really see this being influential in 10 years. It was published 5 years ago, and has this actually been implemented anywhere? It seems like it is a good scheduling policy, but does it have a significant performance benefit over existing solutions?

No comments:

Post a Comment