R. Manikantan, Kaushik Rajan, and R. Govindarajan
February 2011
The effectiveness of the last-level shared cache is crucial
to the performance of a multi-core system. In this paper,
we observe and make use of the DelinquentPC—Next-Use
characteristic to improve shared cache performance. We
propose a new PC-centric cache organization, NUcache,
for the shared last level cache of multi-cores. NUcache
logically partitions the associative ways of a cache set into
MainWays and DeliWays. While all lines have access to the
MainWays, only lines brought in by a subset of delinquent
PCs, selected by a PC selection mechanism, are allowed to
enter the DeliWays. The PC selection mechanism is an intelligent
cost-benefit analysis based algorithm that utilizes
Next-Use information to select the set of PCs that can maximize
the hits experienced in DeliWays.
Performance evaluation reveals that NUcache improves
the performance over a baseline design by 9.6%, 30%
and 33% respectively for dual, quad and eight core workloads
comprised of SPEC benchmarks. We also show that
NUcache is more effective than other well-known cachepartitioning
algorithms.
In Proceedings of the International Conference on High Performance Computer Architecture (HPCA), 2011
Publisher IEEE
| Type | Inproceedings |