James R. Larus and Michael Parkes
A server is commonly organized as a collection of concurrent tasks, each of which runs the server’s code to process a request. The concurrency is built on threads, processes, or event-driven code, all of which provide control independent tasks and dynamic scheduling to mitigate high-latency operations such as I/O and communication. Unfortunately, many of these servers run poorly on modern processors. Measurements show that these systems utilize only a fraction of a processor’s potential. In part, this poor performance is attributable to the programs’ software architecture, which frequently jumps between unrelated pieces of code, thereby reducing the instruction and data locality that is a prerequisite for hardware mechanisms such as caches, TLB, and branch predictors. This paper describes cohort scheduling, a policy that increases code and data locality by batching the execution of similar operations across server requests. Staged computation is a programming model that helps structure programs in a manner conducive to cohort scheduling. The StagedServer library provides an efficient implementation of cohort scheduling using this model. Measurements show that cohort scheduling improves server throughput as much as 13% and reduces CPI as much as 18%.