Computer and Systems Architecture — Silicon Valley

We are currently investigating various topics related to computer architecture (multicore, manycore, transaction memory, etc.), hardware accelerators, systems architecture (storage, nonvolatile memory, etc.), including software and hardware components, and graphics. We strive to understand and optimize systems and system interactions, enabling new paradigms, accelerators, and research platforms. As a result, we build hardware and software systems that facilitate research in a variety of research areas.


Automatic Mutual Exclusion
This project is exploring a new concurrent programming model, Automatic Mutual Exclusion (AME). In contrast to lock-based programming, and to other programming models built over software transactional memory (STM), we arrange that all shared state is implicitly protected unless the programmer explicitly specifies otherwise. An AME program is composed from serializable atomic fragments. We include features allowing the programmer to delimit and manage the fragments to achieve appropriate program structure and performance. The resulting programming model makes it easier to write correct code than incorrect code. It favors correctness over performance for simple programs, while allowing advanced programmers the expressivity they need.

BEE3 stands for the Berkeley Emulation Engine version 3. The BEE3 system is a 2U chassis with a tightly-coupled 4 FPGA system that is a vehicle for Computer Architecture Research. In particular, the BEE3 is the target platform for the Research Accelerator for Multiple Processors (RAMP). RAMP is a collection of six universities (Berkeley, Stanford, UW, UT, CMU, and MIT) and several industry partners including: Microsoft Research, Xilinx, Sun Microsystems and IBM. The BEE3 is a flexible research platform that can facilitate research with 1 2U system or easily interconnect 64 2U systems, or somewhere in between. The BEE3 facilitates research in a multiple areas: Computer Architecture, Systems, OS and Software, Memory Hierarchy and Storage, and various Application/Algorithm Accelerators, to name a few.

Critical Path
This project investigates the use of the global critical path for analysis and optimization of complex, highly parallel hardware devices. We are exploring methods for automatically extracting the critical path, and for automatically using the critical path for optimizing the device power and performance.

The Dryad Project aims to advance the state of the art in writing and managing distributed applications. Converting a sequential and/or single-machine program into a form in which it can be executed in a concurrent, potentially distributed environment is known to be hard. One long-standing technique to address this is to decompose the program into two logical layers: a high-level skeleton which expresses the data-flow, distribution and concurrency properties; and a collection of subroutines each of which is scheduled by the high-level layer, and executes locally with restricted communications to the rest of the program. The Dryad project is developing a programming model which adopts this approach to tackle both data- and compute-intensive problems, scaling from future single-machine many-core PCs up to large-scale data-centers.

The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for ordinary programmers. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ). Dryad provides reliable, distributed computing on thousands of servers for large-scale data parallel applications. LINQ enables developers to write and debug their applications in a SQL-like query language, relying on the entire .NET library and using Visual Studio. DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters.

The goal of the Flashlight project at MSR Silicon Valley is to explore existing and new flash architectures and to build tools to aid in that endeavor.

We present a practical FPGA-based accelerator for solving Boolean Satisfiability problems (SAT). Unlike previous efforts for hardware accelerated SAT solving, our design focuses on accelerating the Boolean Constraint Propagation (BCP) part of the SAT solver, leaving the choices of heuristics such as branching order, restarting policy and learning and backtracking to software. Our novel approach uses an application-specific architecture instead of an instance-specific one to avoid time-consuming FPGA synthesis for each SAT instance. By careful pipelining and avoiding global signal wires, our design is able to achieve much higher clock frequency than that of previous work. Our co-processor can load SAT instances in milliseconds, can handle SAT instances with tens of thousands of variables and clauses using a single FPGA, and can easily be scaled-up to handle more clauses by using multiple FPGAs. Our evaluation using a cycle-accurate simulator shows that the FPGA co-processor can achieve 3.7-38.6x speed up on BCP compared with state-of-the-art software SAT solvers. Single FPGA implementation of the co-processor only consumes 7.16 watt power in the worst case, which is an order of magnitude lower than modern CPU.