The Cloud Systems group within the eXtreme Computing Group in Microsoft Research aspires to make deliberate, breakthrough innovations in software systems and cloud technologies. We are currently working on a programming model and distributed runtime environment enabling rapid cloud-scale development, tools for analyzing large-scale data center networks, and system software to reduce operation costs and increase performance.
Striving to deliver a highly reliable cloud network infrastructure
Orleans is a software framework for building client + cloud applications. Orleans encourages use of simple concurrency patterns that are easy to understand and implement correctly, building on an actor-like model with declarative specification of persistence, replication, and consistency and using lightweight transactions to support the development of reliable and scalable client + cloud software.
- Horton - Querying Large Distributed Graphs
Horton is a research project in the eXtreme Computing Group to enable querying large distributed graphs. It consists of a graph library built on top on Orleans that targets hosting large graphs in a data center. The library provides a querying interface to search the graph for matching paths.
- Marlowe: Intelligent Control For the Cloud
Marlowe is a framework for building intelligent control systems for cloud computing. It spans from controlling hardware (heating, cooling and power) to resource allocation through job placement. Marlowe is composed of societies of intelligent agents drapped over a connical pub/sub interface. Specific control scenarios are built up using federated topologies of these agents.
Scheduling Interactive Services with Deadline and Partial Execution
- Manjula Peiris, James H. Hill, Jorgen Thelin, Sergey Bykov, Gabriel Kliot, and Christian Konig, PAD: Performance Anomaly Detection in Multi-Server Distributed Systems, in 7th IEEE International Conference on Cloud Computing (IEEE Cloud 2014), June 2014.
- Yuxiong He, Sameh Elnikety, James Larus, and Chenyu Yan, Zeta: Scheduling Interactive Services with Partial Execution, in SoCC, SOCC '12 Proceedings of the 3rd ACM Symposium on Cloud Computing, October 2012.
- Peter Bodik, Ishai Menache, Mosharaf Chowdhury, Pradeepkumar Mani, David A. Maltz, and Ion Stoica, Surviving Failures in Bandwidth-Constrained Datacenters, ACM SIGCOMM, August 2012.
- Isabelle Stanton and Gabriel Kliot, Streaming Graph Partitioning for Large Distributed Graphs, in 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ACM, August 2012.
- Navendu Jain, Ishai Menache, Joseph Naor, and F. Bruce Shepherd, Topology-Aware VM Migration in Bandwidth Oversubscribed Datacenter Networks, in ICALP (full version of the conference paper), May 2012.
- Mohamed Sarwat, Sameh Elnikety, Yuxiong He, and Gabriel Kliot, Horton: Online Query Execution Engine for Large Distributed Graphs (Demo Track), in ICDE 2012: 28th IEEE International Conference on Data Engineering, April 2012.
- Navendu Jain, Ishai Menache, Seffi Naor, and Jonathan Yaniv, Near-Optimal Scheduling Mechanisms for Deadline-Sensitive Jobs, in SPAA, 2012.
- Isabelle Stanton and Gabriel Kliot, Streaming Graph Partitioning for Large Distributed Graphs, no. MSR-TR-2011-121, 8 November 2011.
- Sergey Bykov, Alan Geller, Gabriel Kliot, James Larus, Ravi Pandya, and Jorgen Thelin, Orleans: Cloud Computing for Everyone, in ACM Symposium on Cloud Computing (SOCC 2011), ACM, October 2011.
- Brian Guenter, Navendu Jain, and CJ Williams, Managing Cost, Performance, and Reliability Tradeoffs for Energy-Aware Server Provisioning, in 30th IEEE International Conference on Computer Communications (INFOCOM), IEEE Communications Society, 10 April 2011.