I lead the Cloud and Information Services Lab at Microsoft. The lab comprises of two sub-groups: Systems and Machine Learning. As a group, we do innovative research by building production quality systems, publishing papers in top conferences, and contributing code to open source projects. We work closely with Microsoft's Big Data teams. The systems sub-group in particular focuses on systems infrastructure aspects in the "Big data" space.
On a personal side, I am broadly interested in building storage and compute infrastructure for datacenter settings. I enjoy building and deploying systems in practice as well as releasing them as open source. In building these systems, my work leverages upon technology trends in datacenter computing.
My recent work has been in the area of resource management for Big data clusters. We have focused on a building a scale-out resource management substrate for big-data workloads. While the ideas are general, we have implemented our ideas on top of Apache Hadoop YARN:
- Mercury (Hybrid Centralized/Distributed Scheduling; also, see YARN-2877)
- Rayon (Rayon ships as part of Apache Hadoop 2.6; see YARN-1051)
- Tetris (Packing tasks of Big data jobs to improve cluster efficiency)
- Corral (Network-aware scheduling of Big data jobs)
Some of the previous systems I have built and released as open source projects are:
- Kosmos distributed filesystem: I have designed/implemented/deployed (KFS) to manage PB's of storage.
- Sailfish: I have also designed/implemented Sailfish, a compute infrastructure which improves handling of intermediate data (i.e., "shuffle" phase in a Map-Reduce computation). Our results show that Sailfish can improve job completion times at scale by 20% to 5x.
At CISL, I am working on building Hadoop related services on Windows Azure. I also collaborate extensively with colleagues in MSR-Redmond.
A full list of my publications is here.
- Jeff Rasley, Konstantinos Karanasos, Srikanth Kandula, Rodrigo Fonseca, Milan Vojnovic, and Sriram Rao, Efficient Queue Management for Cluster Scheduling, in European Conference on Computer Systems (EuroSys), April 2016.
- Virajith Jalaparti, Peter Bodik, Ishai Menache, Sriram Rao, Konstantin Makarychev, and Matt Caesar, Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can, ACM SIGCOMM, August 2015.
- Konstantinos Karanasos, Sriram Rao, Carlo Curino, Chris Douglas, Kishore Chaliparambil, Giovanni Fumarola, Solom Heddaya, Raghu Ramakrishnan, and Sarvesh Sakalanaga, Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters , in USENIX Annual Technical Conference (USENIX ATC'2015), USENIX – Advanced Computing Systems Association, July 2015.
- Konstantinos Karanasos, Sriram Rao, Carlo Curino, Chris Douglas, Kishore Chaliparambil, Giovanni Fumarola, Solom Heddaya, Raghu Ramakrishnan, and Sarvesh Sakalanaga, Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters , no. MSR-TR-2015-6, February 2015.
- Carlo Curino, Djellel E. Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao, Reservation-based Scheduling: If You’re Late Don’t Blame Us!, in SoCC'14, ACM – Association for Computing Machinery, November 2014.
- Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella, Multi-resource Packing for Cluster Schedulers, ACM SIGCOMM, August 2014.
- Mahesh Balakrishnan, Dahlia Malkhi, Ted Wobber, Ming Wu, Vijayan Prabhakaran, Micheal Wei, John D. Davis, Sriram Rao, Tao Zou, and Aviad Zuck, Tango: Distributed Data Structures over a Shared Log, in SOSP, November 2013.
- Silvius Rus, Micheal Ovsiannikov, Damian Reeves, Paul Sutter, Sriram Rao, Jim Kelly, Chris Zimmerman, Dan Adkins, and Thilee Subramaniam, The Quantcast File System, in 39th International Conference on Very Large Data Bases (VLDB'13), August 2013.
- Sriram Rao, Benjamin Reed, and Adam Silberstein, HotROD: Managing Grid Storage With On-Demand Replication, Workshop on Data Management in the Cloud (DMC'13), April 2013.
- Ganesh Ananthanarayanan, Christopher Douglas, Raghu Ramakrishnan, Sriram Rao, and Ion Stoica, True Elasticity in Multi-Tenant Clusters through Amoeba, in ACM Symposium on Cloud Computing, October 2012.
- Sriram Rao, Raghu Ramakrishnan, Adam Silberstein, Mike Ovsiannikov, and Damian Reeves, Sailfish: A Framework For Large Scale Data Processing, in ACM Symposium on Cloud Computing, October 2012.
- Jianjun Chen, Chris Douglas, Michi Mutsuzaki, Patrick Quaid, Raghu Ramakrishnan, Sriram Rao, and Russell Sears, Walnut: a unified cloud object store, in SIGMOD Conference, May 2012.