Sriram Rao


I am a member of the Cloud and Information Services Lab at Microsoft.

I am broadly interested in building storage and compute infrastructure for datacenter settings.  My recent work has focused on predictable resource management in shared clusters.  I enjoy building and deploying systems in practice as well as releasing software I build as open source projects.  In building these systems, my work leverages upon technology trends in datacenter computing.

In the past, I have designed/implemented/deployed Kosmos distributed filesystem (KFS) to manage PB's of storage.  KFS is currently deployed on a cluster of over 1000 nodes.   Taking advantage of faster processors, increasing network connectivity in the datacenter, KFS has since been extended to support erasure codes (i.e., using erasure codes for archiving "cold" data with R+S encoding). 

I have also designed/implemented Sailfish, a compute infrastructure which improves handling of intermediate data (i.e., "shuffle" phase in a Map-Reduce computation).  Sailfish is based on the observation that the bandwidth within a datacenter will increase substantially in the next few years (viz., 10Gbps between pairs of nodes will be commonplace).  We leverage such an expected increase to do network-wide data aggregation to improve disk subsystem performance during the shuffle step.  Our results show that Sailfish can improve job completion times at scale by 20% to 5x.  

Both KFS and Sailfish have been released as open-source projects.

At CISL, I am working on building Hadoop related services on Windows Azure. I also collaborate extensively with colleagues in MSR-SVC, MSR-Redmond, and MSR-Extreme Computing Group (XCG).

A full list of my publications is here.