Enrique Vallejo, Ramón Beivide, Adrián Cristal, Tim Harris, Fernando Vallejo, Osman Unsal, and Mateo Valero
4 December 2010
Many shared-memory parallel systems use lock-based
synchronization mechanisms to provide mutual exclusion
or reader-writer access to memory locations. Software locks are
inefficient either in memory usage, lock transfer time, or both.
Proposed hardware locking mechanisms are either too specific
(for example, requiring static assignment of threads to cores
and vice-versa), support a limited number of concurrent locks,
require tag values to be associated with every memory location,
rely on the low latencies of single-chip multicore designs or are
slow in adversarial cases such as suspended threads in a lock
queue. Additionally, few proposals cover reader-writer locks
and their associated fairness issues.
In this paper we introduce the Lock Control Unit (LCU)
which is an acceleration mechanism collocated with each core
to explicitly handle fast reader-writer locking. By associating a
unique thread-id to each lock request we decouple the hardware
lock from the requestor core. This provides correct and efficient
execution in the presence of thread migration. By making the
LCU logic autonomous from the core, it seamlessly handles
thread preemption. Our design offers richer semantics than
previous proposals, such as trylock support while providing
direct core-to-core transfers.
We evaluate our proposal with microbenchmarks, a fine-grain
Software Transactional Memory system and programs
from the Parsec and Splash parallel benchmark suites. The
lock transfer time decreases in up to 30% when compared to
previous hardware proposals. Transactional Memory systems
limited by reader-locking congestion boost up to 3x while still
preserving graceful fairness and starvation freedom properties.
Finally, commonly used applications achieve speedups up to a
7% when compared to software models.
In 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO