The DC Genome Project is a joint project between Microsoft Research (MSR) and Microsoft Global Foundation Services (GFS). The goal of the project is to use data-driven and feedback control approaches to monitor, analyze, and improve data center operation efficiencies, to maximize data center capacity utilization, and to minimize their environmental impacts.
Genomotes are customized wireless sensor nodes for data center environmental sensing. They use IEEE 802.15.4 wireless radio for communication. For ease of deployment and reduction of the number of contending wireless nodes, we take a master-slave chained design. The master node is a wireless node, which also has a serial interface to communicate with the slave nodes. A slave node has two serial interfaces, one up chain and one down chain.
RACNet is the network among the wireless Genomotes for data collection. Wireless sensor networking faces significant challenges in a data center environment. The number of nodes in the communication neighborhood can be very large. In our experience, between 50% to 80% nodes can hear (interfere) each other. RACNet used multiple communication channels and a token passing mechanism to avoid congestion in the network. We achieve more than 99.5% data yield in production deployments.
Cypress Data Management
One direct consequence of taking a data-driven approach for data center management is to deal with the massive amount of data generated from sensors (including soft sensors such as application performance counters) and other information sources. Cypress is a compressive data management framework for time series streams. It decomposes time series into multiple compressed feature streams (called trickles). Trickles can be further grouped together to take advantage of spatial correlation for more compression. Common queries such as select, trend, histogram, and correlations can be answered directly from compressed trickles rather than from reconstructing the raw data.
Using the data collected from servers and their environments, we are looking at improving data center operation efficiency through static and dynamic server provisioning. RackPacker is a data-driven static provisioning approach by taking advantage of stationary and statistical variations of workload to improve provisioned power utilization. AutoShift is a dynamic provisioning approach to migrate workload to a minimum number of servers and turn off unnecessary servers. We use a seasonal time series regression technique for load prediction and dynamically skew the load to active servers (c.f. NSDI08 publication).
Joint Resource Control
The computing (cyber-) systems and the physical systems in a data center have their own distinct dynamics. A user request must be servers in milliseconds, while some facility components have a life time of over 15 years. How to organize across the nine-orders of magnitude is a great challenge for resource control purpose. We envision a holistic control framework where information and constraints are shared across the physical and computing boundaries to maximize energy saving potentials. For example, load balancers can be designed to give more load to the servers that can be easily cooled. Workload (and thus power) spikes can be clipped to protect UPS in an oversubscription environment. A critical component in this vision is the joint modeling of various dynamics (continuous time, discrete events, queueing, etc.) and a framework to analyze their interaction.
- Microsoft Global Foundation Services: Mike Manos, Daniel Costello, Amaya Souarez, Patrick Yantz, Jeff O'Reilly, Kelly Roark, Sean James, Christian Belady, Phil Suver, Charl Kunzmann
- Johns Hopkins University: Andreas Terzis
- Harbin Institute of Technology (China): Qiang Wang
- Intern Students: Gong Chen, Wenbo He, Mike Liang, Lakshmi Ganesh, Galen Reeves, Sorabh Gandhi
- Lei Li, Chieh-Jan Mike Liang, Jie Liu, Suman Nath, Andreas Terzis, and Christos Faloutsos, ThermoCast: A Cyber-Physical Forecasting Model for Data Centers, in KDD'11: 17th annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , ACM Conference on Computer Human Interaction, Boston, USA, , August 2011
- Jie Liu, Michel Goraczko, Sean James, Christian Belady, Jiakang Lu, and Kamin Whitehouse, The Data Furnace: Heating Up with Cloud Computing, in 3rd USENIX Workshop on Hot Topics in Cloud Computing, USENIX, June 2011
- Jie Liu, Automatic Server to Circuit Mapping with The Red Pills, in 2010 Workshop on Power Aware Computing and Systems (HotPower '10), USENIX, 3 October 2010
- Chieh-Jan Mike Liang, Jie Liu, Liqian Luo, Andreas Terzis, and Feng Zhao, RACNet: A High-Fidelity Data Center Sensing Network, in Proceedings of The 7th ACM Conference on Embedded Networked Sensor Systems (SenSys 2009), Association for Computing Machinery, Inc., November 2009
- Galen Reeves, Jie Liu, Suman Nath, and Feng Zhao, Managing Massive Time Series Streams with Multi-Scale Compressed Trickles, in VLDB '2009: Proceedings of 35th Conference on Very Large Data Bases , Very Large Data Bases Endowment Inc., August 2009
- Galen Reeves, Jie Liu, Suman Nath, and Feng Zhao, Cypress: Managing Massive Time Series Streams with MultiScale Compressed Trickles, no. MSR-TR-2009-79, 25 June 2009
- Chieh-Jan Mike Liang, Jie Liu, Liqian Luo, and Andreas Terzis, Poster Abstract: Enabling Reliable and High-Fidelity Data Center Sensing, in IPSN, ACM/IEEE, April 2009
- Lakshmi Ganesh, Jie Liu, Suman Nath, and Feng Zhao, Unleash Stranded Power in Data Centers with RackPacker, in Workshop on Energy-Efficient Design (in conjunction with ISCA), 2009
- Jie Liu, Feng Zhao, Jeff O'Reilly, Amaya Souarez, Michael Manos, Chieh-Jan Mike Liang, and Andreas Tersiz, Project Genome: Wireless Sensor Network for Data Center Cooling, in The Architecture Journal, Microsoft, December 2008
- Chieh-Jan Liang, RACNet: Reliable ACquisition Network for High-Fidelity Data Center Sensing, no. MSR-TR-2008-144, October 2008
- MIT Technology Review: Saving Energy in Data Centers
- Popular science (Posci.com): Microsoft Practices Sensor-ship
- KOMO4 News: Microsoft Research: From cells to solar system
- IDG News (appeared on PC World, Computer World, etc.): Microsoft shows off Data-center Monitoring System
- New Scientist: Delaying data could cut net's carbon footprint.