Srikanth Kandula
Senior Researcher,
Microsoft Redmond

My interests are in building and analyzing networked systems. Of late, I have worked on data center networks and big data analysis stacks. I completed my PhD in Computer Science from MIT in 2008.

kandula at alum dot mit dot edu
(425) 538 5407
One Microsoft Way, Redmond, WA 98052

Current projects: lazy approximations and cluster scheduling.

Recent Papers (all)
Quickr: Lazily Approximating Complex Ad-Hoc Queries in Big-Data Clusters
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, Bolin Ding
Some errata in the conference version and proofs are here.
SIGMOD talk slides.
Resource management with deep reinforcement learning
H. Mao, M. Alizadeh, I. Menache, S. Kandula
Graphene: Packing and Dependency-aware Scheduling for Data-parallel Clusters
R. Grandl, S. Kandula, S. Rao, A. Akella, J. Kulkarni
OSDI talk slides. Extended technical report. (Errata: Schedules in Figure 3 have been revised in the technical report.)
A Relational Platform for Efficient Large-scale Video Analytics
Y. Lu, A. Chowdhery, S. Kandula
SOCC talk slides. Some detailed examples and a demo.
Dynamic Pricing and Traffic Engineering for Timely Inter-datacenter Transfers
Virajith Jalaparti, Ivan Bliznets, Srikanth Kandula, Brendan Lucier, Ishai Menache
Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters
Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, Janardhan Kulkarni
Efficient Queue Management for Cluster Scheduling
Jeff Rasley, Konstantinos Karanasos, Srikanth Kandula, Rodrigo Fonseca, Milan Vojnovic, Sriram Rao
CloudBuild: Microsoft's Distributed and Caching Build Service
Hamed Esfahani, Jonas Fietz, Qi Ke, Alexei Kolomiets, Erica Lan, Erik Mavrinac, Wolfram Schulte, Newton Sanches, Srikanth Kandula
Low Latency Geo-Distributed Analytics
Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica
Calendaring for Wide Area Networks
Srikanth Kandula, Ishai Menache, Roy Schwartz, Spandana Babbula

Multi-Resource Packing for Cluster Schedulers
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella

Traffic Engineering with Forward Fault Correction
Hongqiang Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang, David Gelernter

Dionysus: Dynamic Scheduling of Network Updates
X. Jin, H. Liu, R. Gandhi, S. Kandula, R. Mahajan, J. Rexford, R. Wattenhofer, M. Zhang

Speeding up Distributed Request-Response Workflows
MSR, Bing, UIUC, Steklov Math Inst.

We show how to improve the tail latency of datacenter services which are built as workflows over many components by appropriately allocating additional resources across the various stages in the workflow. Also, even a small amount of incompleteness (i.e., returning partial results) can improve latency by a lot if used well.

Leveraging Endpoint Flexibility in Data-Intensive Clusters
UC Berkeley, MSR

Some network traffic in DCs does not care about where it is sent modulo some constraints. We consider using such traffic to better balance network load.

Achieving High Utilization with Software-Driven WAN

We show that given responsive networks and responsive applications adapting who gets to send how much, when, and along which network paths can improve network utilization without losing out on business priorities.

I've worked with some amazing interns at MSR.
Sameer Agarwal (Berkeley), Ganesh Ananthanarayanan (Berkeley), Spandana Babbula (IIT Madras), Ivan Bliznets (Steklov Inst.), Mosharaf Chowdhury (Berkeley), Hossein Falaki (UCLA), Jonas Fietz (EPFL), Robert Grandl (EPFL) Dan Halperin (UW), Chi-Yao Hong (UIUC), Anand Iyer (Berkeley), Virajith Jalaparti (UIUC), Xin Jin (Princeton), Gautam Kumar (Berkeley), Ang Li (Duke), Hyeontaek Lim (CMU), Hongqiang Liu (Yale), Zhicheng Liu (GaTech), Yao Lu (UW), Matthaios Olma (EPFL), Ashish Patro (Wisconsin-Madison), Jonathan Perry (MIT), Qifan Pu (Berkeley), Anil Shanbhag (MIT/ IIT Bombay), Alan Shieh (Cornell), Aleksandar Vitorovic (EPFL).

PC member for SIGMOD, NSDI.
PC member for SIGCOMM, NSDI
PC member for SIGCOMM, SOCC
PC member for HotNets.
IMC, NetDB (co-chair), HotCloud, SLAML, LADIS, MobiHoc
  • SWAN's traffic engineering + approx fairness logic manages traffic on Microsoft's inter datacenter WAN.
  • RoPE's reoptimization logic ships for SCOPE jobs on Cosmos servers since December 2011.
  • Mantri's outlier mitigation logic ships in all Cosmos servers since May 2010. Cosmos is Microsoft's internal big data service with over 10K machines.
  • Flare: Splitting flowlets over multiple paths. Per Conga, implemented and shipped by Cisco Insieme. Also ships with Windows Server 2012 R2; the details are here.
  • wcAsync: An asynchronous web traffic generator
  • ospfOpt: Finding optimal weights for OSPF traffic engineering
  • Broom: Unbiasing Internet path measurements

Short Bio
Srikanth Kandula is a Senior Researcher at Microsoft Research. His research interests span many aspects of networked systems including datacenters and data analytics infrastructure. He is a winner of the NSDI best student paper award (2005). He obtained his Ph. D. from the Massachusetts Institute of Technology (2008).