Shih-wei Liao, Tzu-Han Hung, Donald Nguyen, Hucheng Zhou, Chinyen Chou, and Chiaheng Tu
8 June 2009
A typical data center application requires the processor cycles of thousands of machines. Even a single-digit performance improvement can signiﬁcantly reduce the cost and power consumption of a data center. Unfortunately, achieving sustained improvement, even if modest, is diﬃcult. Data centers are dynamic environments where applications are frequently released and servers are continually upgraded. For maintainability and fault tolerance, the physical capabilities and conﬁguration of the servers are abstracted from the application programmer. We study application performance under diﬀerent processor prefetch conﬁgurations. These conﬁgurations are largely transparent to the programmer, yet we observe a wide range of performance when comparing the worst and best conﬁgurations, with relative performance improvement ranging from 1.4% to 75.1%. Alarmingly, one application that consumes many processor cycles has a 23.6% improvement. Default prefetch conﬁgurations favor aggressively prefetching memory, which beneﬁts most applications, but some data center applications have highly tuned memory behavior and aggressive prefetching severely decreases performance. We develop a tuning framework which attempts to predict the optimal conﬁguration based on hardware performance counters. It applies to a large number of performance-critical data center applications without modifying the source code or binaries. The framework achieves performance within 1% of the best performance of a suite of important data center applications.