Carlo Curino, Djellel Difallah, Chris Douglas, Raghu Ramakrishnan, and Sriram Rao
In this paper, we tackle the problem of running a rich mix of jobs, including job pipelines with gang-scheduling and deadlines, by carefully separating the following concerns: (1)determining resource requirements for a job/pipeline (taking deadlines and other considerations into account), and (2) ensuring predictable allocation of requested resources.
We propose a resource description language that allows each job to specify its resource needs abstractly to the system, exposing many alternative ways of satisfying the job's resource needs. This gives the system flexibility in allocating resources across several jobs, while also allowing it to plan ahead and determine whether it can satisfy any given job's resource request. We show the power of this approach by presenting a scheduling framework that uses these rich resource-requests to ensure predictable resource allocation for production jobs while minimizing latency for best-effort jobs. Our framework relies on admission control (and quick adaptation to changes in cluster usage) to ensure predictable SLAs for resource reservations, and uses work-preserving preemption to dynamically reallocate resources.
We demonstrate these techniques by building Rayon as extension to YARN (Hadoop 2.x). This allows us to validate our work in a real context and against some of the most popular schedulers. Our experimental evaluation is based on micro-benchmarks and ten big-data workloads derived from real-world traces from clusters of Cloudera customers, Facebook, Microsoft, and Yahoo!. We also present the results of running our system at thousands of jobs an hour on a large 256-node cluster.