Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Sriram Rao


I am a member of the Cloud and Information Services Lab at Microsoft.

I am broadly interested in building storage and compute infrastructure for datacenter settings.    I enjoy building and deploying systems in practice as well as releasing software I build as open source projects.  In building these systems, my work leverages upon technology trends in datacenter computing.

My recent work has focused on predictable resource management in shared clusters.  We have built, Rayon, a layer that supports resource reservation/planning for big-data frameworks and integrated with Apache YARN.  Given knowledge of future workload, Rayon plans the cluster's agenda and the online scheduler executes the agenda.  The combination of Rayon+YARN enables the cluster framework to meet allocation SLO's to jobs.  Rayon has been released as OSS and code ships as part of Apache Hadoop 2.6.

In the past, I have designed/implemented/deployed Kosmos distributed filesystem (KFS) to manage PB's of storage.  KFS is currently deployed on a cluster of over 1000 nodes.   Taking advantage of faster processors, increasing network connectivity in the datacenter, KFS has since been extended to support erasure codes (i.e., using erasure codes for archiving "cold" data with R+S encoding). 

I have also designed/implemented Sailfish, a compute infrastructure which improves handling of intermediate data (i.e., "shuffle" phase in a Map-Reduce computation).  Sailfish is based on the observation that the bandwidth within a datacenter will increase substantially in the next few years (viz., 10Gbps between pairs of nodes will be commonplace).  We leverage such an expected increase to do network-wide data aggregation to improve disk subsystem performance during the shuffle step.  Our results show that Sailfish can improve job completion times at scale by 20% to 5x.  

At CISL, I am working on building Hadoop related services on Windows Azure. I also collaborate extensively with colleagues in MSR-SVC, MSR-Redmond, and MSR-Extreme Computing Group (XCG).

A full list of my publications is here.