Share this page
Share this page E-mail this page Print this page RSS feeds
Home > People > Engin Ipek > Self-Optimizing Multicore Architectures
Self-Optimizing Multicore Architectures

As industry rides the transistor density growth in CMPs by providing more and more processor cores, these will exert increasing levels of pressure on shared system resources. Efficient resource management becomes critical to eliminating potential bandwidth, latency, and cost barriers in CMPs. However, this will only be possible if the research community can develop the necessary infrastructure to tackle the complex architectural control problems that arise in managing these systems.

Simultaneously to these advances in computer architecture, the artificial intelligence and machine learning community has made tremendous strides in designing computer programs and algorithms that learn about their environment and improve automatically with experience. Advances in supervised and reinforcement learning have resulted in a vast body of knowledge on automatic classification and prediction, as well as the automated derivation of control policies for dynamic systems. Many of these techniques and tools have been successfully applied to important real-life problems in science and engineering, and they have the potential have a profound impact on how we manage future CMP platforms.

Self-Optimizing Memory Controllers

One shared resource of critical importance on CMPs is off-chip DRAM memory bandwidth. Although exponential growth in transistor densities improves the raw computational power of CMPs, the off-chip bandwidth requirements of these systems also grow commensurately. DRAM scheduling is a complex problem, and presents a number of challenges to obtaining high utilization, challenges that require successfully circumventing many access scheduling constraints, prioritizing requests properly, and adapting to a dynamically changing memory reference stream. Unfortunately, most current memory controller designs employ fixed scheduling policies, with little or no ability to evolve over time and adapt to changing workload demands. In particular, existing schedulers are generally incapable of anticipating the long-term consequences of their actions (planning), or generalizing from past experience to act successfully in new situations (learning). As a result, current memory controllers tend to grossly underutilize the (already limited) bandwidth available.

In [ISCA'08], we propose the use of machine learning technology in designing a self-optimizing, adaptive memory controller capable of planning, learning, and continuously adapting to changing workload demands. We formulate memory access scheduling using reinforcement learning (RL), a field of machine learning that studies how autonomous agents situated in stochastic environments can learn optimal control policies through interaction with their environment. We show that RL provides a general framework for self-optimizing, high-performance architectural controller design. An RL-based design approach allows the hardware designer to focus on what performance target the controller should accomplish and what system variables might be useful to ultimately derive a good scheduling policy, rather than devising a fixed policy that describes exactly how the controller should accomplish that target. This not only eliminates much of the human design effort involved in traditional controller design, but also yields higher-performing controllers.

Coordinated Multi-Resource Allocation

Three critical shared resources in virtually any CMP system are the on-chip last level cache space, the off-chip bandwidth, and the chip’s power budget. Although several proposals that address the management of one of these microarchitectural resources have been published in the literature, coordinated management of multiple interacting resources on CMPs has remained an open research problem. In [MICRO’08], we were the first to propose a hardware mechanism to manage these three resources (1) in a coordinated fashion, (2) at run-time, and (3) with no prior information of the workload (e.g., no profiling).

An important result that we demonstrate in our work is that, due to resource interactions, independent allocation of these microarchitectural resources is systematically inferior to static resource partitioning. We therefore propose a framework that manages multiple shared CMP resources in a coordinated fashion to enforce higher-level performance objectives. We formulate global resource allocation as a machine learning (ML) problem. At runtime, our ML-based resource management scheme monitors the execution of each application, and learns a predictive model of system performance as a function of allocation decisions. By learning each application’s performance response to different resource distributions, our approach makes it possible to anticipate the system-level performance impact of allocation decisions at runtime with little runtime overhead. As a result, it becomes possible to make reliable comparisons among different points in a vast and dynamically changing allocation space, allowing us to adapt our allocation decisions as applications undergo phase changes.

 

Home