Approaching the End of Moore’s Law: Time to Reinvent the System Stack?

Wednesday, July 16, 2014 | Redmond, WA, United States

The rapid, continuous, and economically viable evolution of semiconductor technology scaling has, in multiple cases, outpaced the system stack’s ability to evolve adequately. However, growing economic difficulties in extending scaling may make it difficult to sustain further efficiency gains via semiconductor technology. The recent movements toward multicore, specialization, and optimized storage stacks follow from this phenomenon. The demands of applications and their data on storage and processing capabilities are not expected to stop growing, putting even more pressure on system efficiency. This full-day workshop will provide insights on the efficiency issue through examples, and then promote discussion of opportunities in this area through a mix of driving application scenarios and synergistic opportunities across the system stack.


  • Date and time: July 16, 2014, 9:00 A.M.–5:30 P.M.
  • Location: Microsoft Conference Center (Building 33), 16070 NE 36th Way, Redmond, WA 98052
    We will be in the St. Helen's room.
  • Transportation: For those staying at the Hyatt, the buses will start boarding at 7:50 A.M. The last bus leaves at 8:30 A.M. In the afternoon, return buses will start boarding at 5:40 P.M. and leave at 5:50 P.M., returning to the Hyatt.
  • Meals: Breakfast, lunch, and snacks/refreshments will be served. Breakfast starts at 8:30 A.M.
  • Registration: Please check in at the registration desk. There will be a separate badge for this event.





8:30–9:00 A.M.



9:00–9:30 A.M.

Computing's Energy Problem (and what we can do about it)

Almost all electronic systems are energy limited, from the computers in your Bluetooth headset to the ones that answer the questions that you type into Google. After reviewing why computing became power limited even before the energy scaling of gates slowed down, I will explain why I can’t count on technology to save the day this time.


We next look to make a modern web server farm more efficient. As CPUs have improved, memory and I/O system energy dominate. While reducing this energy is possible, this data raises the question of how ASICs can be 1000x more efficient than processors if memory energy dominates. The secret is ASIC’s choice of applications: those that have many short-integer ops and extremely local storage. The rest of the talk will explain how we have leveraged this insight to create both a programmable unit with ASIC efficiencies and constraints for other energy-efficient customized hardware. 

Professor Mark Horowitz, Stanford University

Mark Horowitz is the Yahoo! Founders Professor at Stanford University and was chair of the Electrical Engineering Department from 2008 to 2012. He co-founded Rambus, Inc. in 1990 and is a fellow of the IEEE and the ACM and a member of the National Academy of Engineering and the American Academy of Arts and Science. Dr. Horowitz's research interests are quite broad and span using EE and CS analysis methods to problems in molecular biology to creating new design methodologies for analog and digital VLSI circuits.


9:30–9:50 A.M.

Willow: Making Storage Semantics Flexible

Emerging, fast non-volatile memories have vastly improved performance and increased flexibility in storage systems compared to conventional disks, but storage interfaces have not kept up. The performance of these memory technologies places greater pressure on system storage software performance while their increased flexibility (e.g., in terms of high-performance random access) demands a richer interface than the read/write semantics that disks support. To understand how software must adapt and how interfaces must change, we have developed a prototype high-performance storage system called Willow. The Willow hardware interface and software stack work together to remove legacy storage software overheads and make programmability the central abstraction of the accessing storage, allowing applications to customize Willow's semantics using software. Adding programmability results in large application-level performance gains and a vastly expanding number of applications that can benefit from customized storage interfaces.

Professor Steven Swanson, University of California, San Diego

Steven Swanson is an associate professor in the Department of Computer Science and Engineering at the University of California, San Diego, and the director of the Non-volatile Systems Laboratory. His research interests include the systems, architecture, security, and reliability issues surrounding non-volatile, solid-state memories. He also co-leads projects to develop low-power co-processors for irregular applications and to devise software techniques for using multiple processors to speed up single-threaded computations. In previous lives he has worked on scalable dataflow architectures, ubiquitous computing, and simultaneous multithreading. He received his PhD from the University of Washington in 2006 and his undergraduate degree from the University of Puget Sound.

9:50–10:10 A.M.

Applying Three Principles When Reinventing the System Stack for Efficiency

We outline three principles of efficiency that have guided our recent research on system software. First, the proximity principle says that waste shall be processed at the source. This requires programmability be introduced close to the source of any data path. Second, the proportionality principle says that you do not kill a chicken with an ox cleaver. This requires a common abstraction be built on top of heterogeneous resources. Third, the parsimony principle says that idle hardware units should sleep. It suggests device drivers be relieved of the responsibility of power management.

Professor Lin Zhong, Rice University

Lin Zhong received his B.S and M.S. from Tsinghua University and Ph.D. from Princeton University. He has been with Rice University since September 2005 where he is currently an associate professor. He was a visiting researcher with Microsoft Research for the summer of 2011 and March to December 2012. At Rice, he leads the Efficient Computing Group to make computing, communication, and interfacing more efficient and effective. He is a recipient of the National Science Foundation CAREER Award and of the best paper awards from ACM MobileHCI 2007, IEEE PerCom 2009, and ACM MobiSys 2011, 2013 and 2014, and ACM ASPLOS 2014. He received the ACM SIGMOBILE Rockstar Award 2014.

10:10–10:30 A.M.

Tune, Rewrite, Reinvent

Our software infrastructure is insanely complex, with hundreds of millions of lines of code running on our phones, computers, networks, and servers. Its shortcomings and flaws are abundantly apparent, so it not surprising that many developers’ and researchers’ instinct is to start over again and do it better next time around. In industry though, the opposite approach, of muddling through with admittedly imperfect code, usually prevails. This talk touches on the question of when to tune, rewrite, or reinvent a piece of software, with a strong focus on several examples of interest to the architecture and systems communities.

James Larus, EDFL

James Larus is Professor and Dean of the School of Computer and Communication Sciences (IC) at EPFL (École Polytechnique Fédérale de Lausanne). Prior to that position, Larus was a researcher and manager in Microsoft Research for over 16 years and an assistant and associate professor in the Computer Sciences Department at the University of Wisconsin, Madison.


Larus has been an active contributor to the programming languages, compiler, software engineering, and computer architecture communities. He published over 100 papers (with 9 best and most influential paper awards), received 30 US patents, and served on numerous program committees and NSF, NRC, and DARPA panels. His book, Transactional Memory (Morgan Claypool) appeared in 2007. Larus became an ACM Fellow in 2006.

10:30–11:00 A.M.



11:00–11:20 A.M.

Rethinking Systems Architecture for Scale-Out Workloads

Current commodity system architectures are based on dual-socket designs, which haven’t changed much for almost two decades. A new class of systems is emerging based on SoCs and this presents the opportunity to rethink the system architecture stack for compute, storage and network. This talk will provide an overview of opportunities and challenges in this regard.

Kushagra Vaid, General Manager for Server Engineering, Microsoft Cloud & Enterprise Division

Kushagra Vaid is the General Manager for Server Engineering in Microsoft’s Cloud & Enterprise division. He is responsible for driving hardware R&D, engineering designs, deployments and support for Microsoft’s cloud scale services (such as Bing, Azure, Office 365, and others) across a global datacenter footprint.


11:20–11:40 A.M.

Programming a Reconfigurable Fabric for Large-Scale Datacenter Services

The ending of Moore’s Law will have a profound impact on datacenter operators, who have traditionally relied on steady advances in processor performance and efficiency to make improved services economically viable. The Catapult reconfigurable fabric at Microsoft ushers in a new datacenter architecture that marries programmable software with efficient and low-power programmable hardware (i.e., FPGAs) at scale. Catapult is deployed on a bed of 1,632 servers and was shown to double the ranking throughput of a large-scale web search workload (Bing).


Exploiting a large-scale reconfigurable fabric such as Catapult required radical changes to conventional software-hardware contracts, programming methodology, and coordination between hardware and software teams at Microsoft. Some challenges encountered were: (1) the need to accommodate rapid changes to models and algorithms, (2) the need for flexible but well-defined contracts and interfaces between high-level software and low-level accelerator engines, and (3) mitigating the productivity challenges associated with programming low-level accelerators. This talk will present some of the solutions used in the Bing pilot, and offer suggestions and directions for future work.

Eric Chung, Researcher, Microsoft Research Technologies

Eric S. Chung is a Researcher in the Microsoft Research Technologies lab at Redmond. Eric is a member of the Catapult team and is interested in prototyping and productively harnessing novel hardware systems that incorporate specialized and reconfigurable hardware such as FPGAs. Eric received his PhD in 2011 from Carnegie Mellon University and was the recipient of the Microsoft Research Fellowship in 2009. His paper on CoRAM, a new memory abstraction and architecture for programming FPGAs more productively, received the best paper award at FPGA 2011.

11:40–12:00 P.M.


Evolving a Natural User Interface Sensor from Game Console to Personal Systems: Scaling Size, Power and Compute for Embedded Applications 

We describe the 3D Time of Flight image sensor system that was developed to enable a Natural User Interface (NUI) for the XBOX One console system. However, in order to enable NUI features on smaller, more personal systems, the sensor technology must evolve to fit within the much tighter space constraints and power budgets of tablets and smartphones. We will go on to describe the things we can do in hardware and software to enable this evolution and what that means for hardware architectures going forward.

Pat O’Connor, Silicon Development, Microsoft Devices Division 

Pat O'Connor is a Partner, Director of Engineering in the Silicon Development group in Devices Division, responsible for hardware and software development of sensors and custom silicon in Microsoft's Natural User Interface platforms, such as Kinect. Prior to joining Microsoft, Pat was VP of Engineering at Canesta Inc., a pioneer in the area of Time Of Flight 3D Image Sensors, and held previous engineering positions at Parthus Technologies, Aureal Semiconductor, Rockwell International and Analog Devices. Pat holds a BSc in Electrical Engineering from Trinity College, Dublin.  

12:00–12:20 P.M.

Energy Scalability in Low-Power Sensing Systems Through Data Compression

Insufficient computation ability, scant storage capacity, high communication energy, and low network bandwidth are important limitations that affect system-level performance. These constraints impact system stacks with all levels of design complexity ranging from multi-node computing clusters to ultra-low power sensing platforms. Among such platforms, one common factor that has gone against Moore’s law is an increase in the amount of data that is handled by the systems, which has clearly exacerbated over time. In this presentation, we will focus on low-power sensing systems in a mobile context. We will see how compressing data can help keep data increase under check and thereby alleviate some of the above limitations. With some forms of compression, however, data get altered and analyzing them down the system stack requires us to reconstruct or un-compress the signals, which may not be possible due to relatively high energy costs of reconstruction. I will describe emerging sensing system architectures that are starting to explore ways of directly analyzing compressed data without reconstruction. Besides avoiding reconstruction, such transformations also allow us to build energy-scalable systems through data compression. I will presents results from an integrated circuit implementation for one exemplary low-power system that show computational energy scaling in the range 1.2-214 µJ depending on the amount of compression (2-24×). Thus, we will see that this approach has the potential to prolong battery lives in sensing systems by up to 5x.

Mohammed Shoaib, Researcher, Microsoft Research Redmond

Mohammed Shoaib is a member of the Sensing and Energy Research Group at MSR. His work focuses on the VLSI design of machine-learning and signal-processing systems for low-power sensing applications. In the recent decade, machine-learning research has progressed faster than ever enabling analytics on a wide-variety of data. This has created a push towards the use of learning algorithms even in the smallest of devices, including wearables. One the other hand, however, these devices are also starting to face stricter energy and performance constraints. Thus, enabling hardware support for learning algorithms on energy-constrained devices is a deep and complex area of research, especially given the myriad of technologies, architectural/algorithmic options, and performance trade-offs. Currently, Shoaib is working on enabling image-processing and computer-vision algorithms that employ machine learning on wearable devices. In the past, he has received the Ph.D. and MA degrees in Electrical Engineering from Princeton University, and the B.Tech and M.Tech dual degree in Electrical Engineering from IIT Madras. He is a member of the IEEE and the ACM.


12:20–1:00 P.M.



1:00–2:30 P.M.


Break-out Sessions 

Suggestions of topics to discuss (5–9 people/group): We should have a board up for people to sign up in the morning/during lunch.

  • Exposing more of memory to software
  • Hardware specialization and heterogeneity
  • Datacenter efficiency
  • Storage stacks: can we do better?
  • Network stacks
  • Emerging applications in need of efficiency
2:30–3:00 P.M.  Break   
3:00–4:00 P.M. 

Report out 

4:00–5:30 P.M. 

Panel – Cooperative Game Theory for Computing Performance: Will the Post Dennard Era Finally Incentivize Hardware Software Cooperation?

Each generation of applications has seemingly insatiable demand for more processing power and storage. However without Dennard scaling or Moore's law, executing demanding applications, much less improving their performance and energy efficiency, will require increasingly sophisticated programmers, algorithms, compilers, runtime systems, and sophisticated specialized hardware. In the past, the ecosystem flourished in part because application writers, system developers, and architects innovated in isolation. Will it be possible to continue this separation of concerns and improve future system capabilities? Panelist will argue for or against cooperative hardware software design. Attendees will contribute their opinions, and then we will vote.

Moderator: Kathryn McKinley

Kathryn S. McKinley is a Principal Researcher at Microsoft. She was previously an Endowed Professor of Computer Science at The University of Texas at Austin where she graduated 18 PhD students. Her research interests span programming language implementation, architecture, security, performance, and energy. She and her collaborators have produced widely used tools: the DaCapo Java Benchmarks, TRIPS Compiler, Hoard memory manager, MMTk garbage collector toolkit, and the Immix garbage collector. Her awards include the 2012 ACM SIGPLAN Programming Languages Software Award, and Best or Most Influential papers at ASPLOS, OOPSLA, ICS, SIGMETRICS, IEEE Top Picks, and CACM Research Highlights. Her service includes DARPA ISAT member (2012-present), CRA Board member (2012-present), and CRA-W co-chair (2011-present). Dr. McKinley was honored to testify to the House Science Committee (Feb. 14, 2013). She has a husband and three sons. She is an IEEE Fellow and an ACM Fellow. 



Alexandra Fedorova, Simon Fraser University

Alexandra (Sasha) Fedorova is an Associate Professor of Computer Science at Simon Fraser University. She earned her Ph.D. at Harvard in 2006 with a thesis on addressing limitations of multicore systems using software techniques. While completing PhD she interned at Sun Labs where she co-authored the simulator for the Sun’s multicore processor Niagara. In 2006, Sasha joined the School of Computing Science at SFU where she co-founded the Systems, Networking and Architecture (SYNAR) research lab and currently leads a group of 12 graduate students and postdocs. Sasha has more than 30 publications in top scientific venues and her research is supported by the Natural Sciences and Engineering Research Council of Canada, British Columbia Innovation Council, Oracle, Google, Intel, ST Microelectronics, Research in Motion and Electronic Arts. Sasha is the recipient of the 2011 Anita Borg Early Career Award, and in 2012, she was named an Alfred P. Sloan Fellow.


Karin Strauss, Researcher, Microsoft Research Redmond

Karin Strauss' research interests include computer architecture, systems, and bio-compatible computation. Lately, she has been focused on the design of future memory systems, especially those with wearable main memories (i.e., those that wear out as they are written). Karin is an IEEE and ACM senior member, author of over 30 papers published in top venues, including the IEEE Micro Magazine’s Top Picks in Computer Architecture, and inventor on over 20 patents. She received her PhD from University of Illinois at Urbana-Champaign and worked at AMD Research before joining Microsoft Research.


Steven Blackburn, Australian National University

Steve Blackburn is a professor in the Research School of Computer Science at the Australian National University. His research interests include programming language implementation, architecture, and performance analysis. Steve has been heavily involved in two major research infrastructure projects: the DaCapo benchmark suite and Jikes RVM.


Tim Sherwood, University of California, Santa Barbara

Tim Sherwood is a Professor at UC Santa Barbara specializing in the development of processors exploiting novel technologies (e.g., plasmonics and memristors), provable properties (e.g., information flow security or deadlock freedom), and hardware-aware algorithms (e.g., high-throughput string scanning or new logic representations). He is the recipient of the Northrop Grumman Teaching Excellence Award, the NSF Career Award, the UCSB Academic Senate Distinguished Teaching Award, and on six separate occasions, his papers have been selected by IEEE Micro as a "Top Pick" for the year. Prior to joining UCSB in 2003, he graduated with a B.S. in Computer Science from UC Davis (1998), and received his M.S. and Ph.D. from UC San Diego (2003).



Microsoft Conference Center, Building 33
16070 NE 36th Way, Redmond, WA 98052

This event is co-located with Faculty Summit 2014. Learn about other co-located events.