My interests are in building and analyzing networked systems.
Of late, I have worked on data center networks and big data analysis stacks.
I completed my PhD in Computer Science from MIT in 2008.
kandula at alum dot mit dot edu
(425) 538 5407
One Microsoft Way, Redmond, WA 98052
Current projects: lazy approximations and
Recent Papers (all)
Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters
Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, Janardhan Kulkarni
Quickr: Lazily Approximating Complex Ad-Hoc Queries in Big-Data Clusters
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, Bolin Ding
Efficient Queue Management for Cluster Scheduling
Jeff Rasley, Konstantinos Karanasos, Srikanth Kandula, Rodrigo Fonseca, Milan Vojnovic, Sriram Rao
CloudBuild: Microsoft's Distributed and Caching Build Service
Hamed Esfahani, Jonas Fietz, Qi Ke, Alexei Kolomiets, Erica Lan, Erik Mavrinac, Wolfram Schulte, Newton Sanches, Srikanth Kandula
Low Latency Geo-Distributed Analytics
Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica
Calendaring for Wide Area Networks
Srikanth Kandula, Ishai Menache, Roy Schwartz, Spandana Babbula
Multi-Resource Packing for Cluster Schedulers
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella
Traffic Engineering with Forward Fault Correction
Hongqiang Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang, David Gelernter
Dionysus: Dynamic Scheduling of Network Updates
X. Jin, H. Liu, R. Gandhi, S. Kandula, R. Mahajan, J. Rexford, R. Wattenhofer, M. Zhang
Speeding up Distributed Request-Response Workflows
MSR, Bing, UIUC, Steklov Math Inst.
We show how to improve the tail latency of datacenter services which are built as workflows over many components by appropriately allocating additional resources across the various stages in the workflow.
Also, even a small amount of incompleteness (i.e., returning partial results) can improve latency by a lot if used well.
Achieving High Utilization with Software-Driven WAN
MSR, Bing, GNS, UIUC
We show that given responsive networks and responsive applications
adapting who gets to send how much, when, and along which network paths can improve
network utilization without losing out on business priorities.
I've worked with some amazing interns at MSR.
Sameer Agarwal (Berkeley),
Ganesh Ananthanarayanan (Berkeley),
Spandana Babbula (IIT Madras),
Ivan Bliznets (Steklov Inst.),
Mosharaf Chowdhury (Berkeley),
Hossein Falaki (UCLA),
Jonas Fietz (EPFL),
Robert Grandl (EPFL)
Dan Halperin (UW),
Chi-Yao Hong (UIUC),
Anand Iyer (Berkeley),
Virajith Jalaparti (UIUC),
Xin Jin (Princeton),
Gautam Kumar (Berkeley),
Ang Li (Duke),
Hyeontaek Lim (CMU),
Hongqiang Liu (Yale),
Zhicheng Liu (GaTech),
Yao Lu (UW),
Matthaios Olma (EPFL),
Ashish Patro (Wisconsin-Madison),
Jonathan Perry (MIT),
Qifan Pu (Berkeley),
Anil Shanbhag (MIT/ IIT Bombay),
Alan Shieh (Cornell),
Aleksandar Vitorovic (EPFL).
SWAN's traffic engineering + approx fairness logic manages traffic on Microsoft's inter datacenter WAN.
RoPE's reoptimization logic ships for SCOPE jobs on Cosmos servers since December 2011.
Mantri's outlier mitigation logic ships in all Cosmos servers since May 2010. Cosmos is Microsoft's internal big data service with over 10K machines.
Flare: Splitting flowlets over multiple paths. Per Conga, implemented and shipped by Cisco Insieme. Also ships with Windows Server 2012 R2; the details are here.
wcAsync: An asynchronous web traffic generator
ospfOpt: Finding optimal weights for OSPF traffic engineering
Broom: Unbiasing Internet path measurements
Srikanth Kandula is a Senior
Researcher at Microsoft Research. His research interests span many
aspects of networked systems including datacenters and data analytics infrastructure.
He is a winner of the NSDI best student paper award (2005).
He obtained his Ph. D. from the Massachusetts Institute of