Share this page
Share this page E-mail this page Print this page RSS feeds
Home > People > Engin Ipek > Dynamically Reconfigurable Multicore Architectures
Dynamically Reconfigurable Multicore Architectures

Introducing flexibility and sophistication into architectural control mechanisms is only part of the solution to accommodating software diversity on future CMP platforms. In addition to efficient resource management, future CMP systems will also need to exhibit flexibility in their execution substrates. One essential capability that is absent from existing CMP architectures in this regard is to accommodate software at different stages of parallelization, by allowing the granularity of the architecture to be changed at runtime.

Core Fusion

In the short term, on-chip integration of a modest number of cores may yield high utilization on CMPs when running multiple sequential applications. In that case, sequential programs will still favor relatively large cores that can extract high levels of ILP. However, although sequential codes are likely to remain important, they alone are not sufficient to sustain long-term performance scalability. Consequently, harnessing the full potential of CMPs in the long term also necessitates the adoption of parallel programming to build future applications. As CMPs become ubiquitous, we envision a dynamic and diverse landscape of software products of very different characteristics and in different stages of development: from purely sequential, to highly parallel, and everything in between. The conflicting demands of this software diversity (in terms of core count and per-core performance), compounded by the need to support multiple such applications in a multiprogrammed environment, requires a level of flexibility that is hard to come by today in the research literature, much less in the market.

 In [ISCA'07], we investigate a novel reconfigurable hardware mechanism that we call core fusion. It is an architectural technique that empowers groups of relatively simple and fundamentally independent CMP cores with the ability to "fuse" into one large CPU on demand. The goal is to "synthesize" dynamically the right CMP architecture based on software needs at each point in time. We envision a core fusion CMP as a homogeneous substrate with conventional memory coherence/consistency support, where groups of up to four adjacent cores and their i- and d-caches can be fused at run-time into CPUs that have up to four times the fetch, issue, and commit width, and up to four times the i-cache, d-cache, branch predictor, and BTB size. Core fusion gracefully accommodates software diversity and incremental parallelization in CMPs. It provides a single execution model across all configurations, requires no additional programming effort or specialized compiler support, maintains ISA compatibility, and leverages mature micro-architecture technology.

Dynamic Core Coupling

Dynamic reconfiguration of the execution substrate can also play an important role in designing resilient CMP architectures that can adapt and reconfigure around hardware failures. Aggressive CMOS scaling will make future chip multiprocessors increasingly susceptible to transient faults, hard errors, manufacturing defects, and process variations. Existing CMP proposals that implement dual modular redundancy (DMR) do so by statically binding pairs of adjacent cores via dedicated communication channels and buffers. This can result in unnecessary performance degradations when one core is defective (in which case its DMR pair must be disabled), or in performance/power losses when the DMR pair exhibits frequency/leakage variations. Static binding also puts additional pressure on thermal management, since DMR pairs running code with similar thermal characteristics are necessarily placed next to each other.

 In [DSN'07], we describe dynamic core coupling (DCC), an architectural technique that allows arbitrary CMP cores to verify each other’s execution while requiring no static core binding at design time or dedicated communication hardware. This results in hardware that degrades half as fast as mechanisms that rely on static binding, and provides support for on-demand triple modular redundancy (TMR) to overcome hard faults. It also allows for more flexible management of thermal density and variation-induced hardware inefficiencies.  

Home