Aleksandar Dragojević, Yang Ni, and Ali-Reza Adl-Tabatabai
In this paper, we identify transaction-local memory as a major source of overhead from compiler instrumentation in software transactional memory (STM). Transaction-local memory is memory allocated inside a transaction, which cannot escape (i.e., is captured by) the allocating transaction. Accesses to such memory do not require calls to STM memory access functions (i.e., STM barriers). A compiler unaware of that may translate accesses to captured memory into expensive STM barriers. This presents us opportunities to improve STM performance. Our measurements with the STAMP benchmark suite (version 0.9.9) revealed that as many as 60% of the STM barriers generated by our baseline compiler access captured memory, including 90% of the write barriers and 45% of the read barriers. We propose runtime and compiler optimizations to elide STM barriers to captured memory. These techniques can also elide barriers for accesses to thread-local and read-only data. We implemented those optimizations in the Intel C++ STM compiler. Our experiments with the STAMP benchmark suite on a Intel Dunnington system (with 24 cores in a 4-node SMP system) show that these optimizations can improve performance by to 18% at 16 threads.
|Published in||SPAA 2009|