Author Retrospective for “Optimizing for Parallelism and Data Locality”

  • Kathryn S McKinley

ACM/IEEE International Conference on Supercomputing (ICS) 25th Anniversary Volume |

Today there is an urgent need for algorithms, programming language systems and tools, and hardware that deliver on the potential of parallelism due to the end of Dennard scaling. This work (from my PhD dissertation, supervised by Ken Kennedy) was one of the early papers to optimize for and experimentally explore the tension between data locality and parallelism on shared memory machines. A key result was that false sharing of cache lines between processors with local caches on separate chips was disastrous to the performance and scaling of applications. This retrospective includes a short personal tour through the history of parallel computing, a discussion of locality and parallelism modeling versus a polyhedral formulation of optimizing dense matrix codes, and how this problem is still relevant to compilers today. I end with a short memorial to my deceased co-author and advisor Ken Kennedy.