James R. Larus and Michael Parkes
A server application is commonly organized as a collection of concurrent threads, each of which executes the code necessary to process a request. This software architecture, which causes frequent control transfers between unrelated pieces of code, decreases instruction and data locality, and consequently reduces the effectiveness of hardware mechanisms such as caches, TLBs, and branch predictors. Numerous measurements demonstrate this effect in server applications, which often utilize only a fraction of a modern processor’s computational throughput. This paper addresses this problem through cohort scheduling, a new policy that increases code and data locality by batching the execution of similar operations arising in different server requests. Effective implementation of the policy relies on a new programming abstraction, staged computation, which replaces threads. The StagedServer library provides an efficient implementation of cohort scheduling and staged computation. Measurements of two server applications written with this library show that cohort scheduling can improve server throughput by as much as 20%, by reducing the processor cycles per instruction by 30% and L2 cache misses by 50%.
In Proceedings of the USENIX 2002 Conference