José Renau, James Tuck, Wei Liu, Luís Ceze, Karin Strauss, and Josep Torrellas
Chip Multiprocessors (CMPs) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must exploit multiple sources of speculative task-level parallelism, including any nesting levels of both subroutines and loop iterations. Unfortunately, these environments are hard to support in decentralized CMP hardware: since tasks are spawned out-of-order and unpredictably, maintaining key TLS basics such as task ordering and efficient resource allocation is challenging.
While the concept of out-of-order spawning is not new, this paper is the first to propose a set of microarchitectural mechanisms that, altogether, fundamentally enable fast TLS with out-of-order spawn in a CMP. Moreover, we develop a fully-automated TLS compiler for aggressive out-of-order spawn. With our mechanisms, a TLS CMP with four 4-issue cores achieves an average speedup of 1.30 for full SPECint 2000 applications; the corresponding speedup for in-orderonly spawn is 1.04. Overall, our mechanisms unlock the potential of TLS for the toughest applications.
|Published in||ICS 2005 (International Conference on Supercomputing)|
|Publisher||Association for Computing Machinery, Inc.|
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or firstname.lastname@example.org. The definitive version of this paper can be found at ACM’s Digital Library --http://www.acm.org/dl/.