Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Austin Donnelly - all publications




    • Richard Black, Austin Donnelly, and Cédric Fournet, Ethernet Topology Discovery without Network Assistance, in Proceedings of 12th IEEE International Conference on Network Protocols (ICNP'04), IEEE Computer Society, October 2004.

      This work addresses the problem of Layer 2 topology discovery. Current techniques concentrate on using SNMP to query information from Ethernet switches. In contrast, we present a technique that infers the Ethernet (Layer 2) topology without assistance from the network elements by injecting suitable probe packets from the end-systems and observing where they are delivered. We describe the algorithm, formally characterize its correctness and completeness, and present our implementation and experimental results. Performance results show that although originally aimed at the home and small office the techniques scale to much larger networks.

    • Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier, Using Magpie for request extraction and workload modelling, in Proceedings of the Sixth USENIX Symposium on Operating Systems Design and Implementation (OSDI) 2004, USENIX, December 2004.

      Tools to understand complex system behaviour are essential for many performance analysis and debugging tasks, yet there are many open research problems in their development. Magpie is a toolchain for automatically extracting a system's workload under realistic operating conditions. Using low-overhead instrumentation, we monitor the system to record fine-grained events generated by kernel, middleware and application components. The Magpie request extraction tool uses an application-specific event schema to correlate these events, and hence precisely capture the control flow and resource consumption of each and every request. By removing scheduling artefacts, whilst preserving causal dependencies, we obtain canonical request descriptions from which we can construct concise workload models suitable for performance prediction and change detection. In this paper we describe and evaluate the capability of Magpie to accurately extract requests and construct representative models of system behaviour.


    • Richard Mortier, Rebecca Isaacs, Austin Donnelly, and Paul Barham, Anemone: Edge-based network management, in INFOCOM 2005, IEEE Communications Society, March 2005.

      This proposal describes the Anemone project and a demonstration of the work so far. The project is developing an edge-based IP network management platform which utilises only information collected at the edges of the network, eschewing the need to collect data in the network core. Devoting a small fraction of hosts' idle cycles, disk space, and network bandwidth to network management allows inference of network-wide traffic patterns by synthesising end-system flow statistics with dynamic topology information obtained through passive snooping of IP routeing protocols. We claim that this approach will provide a more complete view of the network that supports sophisticated traffic engineering queries to supply the global statistics necessary to automate network control, and is future-proofed against increasing deployment of encrypting and tunnelling protocols.

    • Richard Black, Austin Donnelly, Alexandru Gavrilescu, and Dave Thaler, Fast Scalable Robust Node Enumeration, in Proceedings of 4th International IFIP-TC6 Networking Conference (NETWORKING 2005), Springer-Verlag, May 2005.

      In a Local Area Network of computers, often a machine wants to learn of the existence of all the others satisfying some condition. Specifically, there are a number of existing discovery algorithms which permit an enumerator to reliably discover protocol participants, many of them idealised. This paper provides a new technique which controls the load placed on the network, minimises the time to completion, handles networks with significant loss, and scales over many orders of magnitude. Most significantly, the protocol also deals with the possibility of a malicious enumerator; an important contribution needed for current real-world networks. We also address the effects of several systems and engineering aspects, including scheduler jitter and clock quantisation.

    • D. Gunawardena, A. Donnelly, J. Scott .Berkley, and A. Zugenmaier .NTT DoCoMo, Countering Automated Exploits with System Security CAPTCHAS, in 13th International Cambridge Security Protocols Workshop (ICSPW 05), , July 2005.


    • Richard Mortier, Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron, Seaweed: Distributed scalable ad hoc querying, in Proceedings of 2nd IEEE International Workshop on Networking Meets Databases (NetDB 2006), IEEE, Atlanta, GA, April 2006.
    • Evan Cooke, Richard Mortier, Austin Donnelly, Paul Barham, and Rebecca Isaacs, Reclaiming network-wide visibility using ubiquitous end system monitors, in USENIX 2006 Annual Technical Conference, USENIX, June 2006.

      Network-centric tools like NetFlow and security systems like IDSes provide essential data about the availability, reliability, and security of network devices and appli-cations. However, the increased use of encryption and tunnelling has reduced the visibility of monitoring ap-plications into packet headers and payloads (e. g. 93% of traffic on our enterprise network is IPSec encapsulated). The result is the inability to collect the required infor-mation using network-only measurements. To regain the lost visibility we propose that measurement systems must themselves apply the end-to-end principle: only endsys-tems can correctly attach semantics to traffic they send and receive. We present such an end-to-end monitoring platform that ubiquitously records per-flow data and then we show that this approach is feasible and practical using data from our enterprise network.

    • Dushyanth Narayanan, Austin Donnelly, Richard Mortier, and Antony Rowstron, Delay aware querying with Seaweed, in Proceedings of 32nd International Conference on Very Large Data Bases (VLDB 2006), VLDB, Seoul, Korea, September 2006.
    • Richard Black, Austin Donnelly, Glenn Ward, Alvin Tan, and Alexandru Gavrilescu, LLTD: Link Layer Topology Discovery Protocol. A Windows® Rally™ Specification., 15 September 2006.

      This specification describes how the Link Layer Topology Discovery (LLTD) protocol operates over wired (802.3 Ethernet) and wireless (802.11) media. As the protocol name suggests, the core functions of LLTD enable applications to discover the topology of a network. In addition, LLTD has optional QoS Extensions that applications can use to diagnose problems, especially those involving signal strength on wireless networks or bandwidth constraints in home networks.

      LLTD is a key component of the Microsoft® Windows® Rally™ set of technologies.


    • Dushyanth Narayanan, Austin Donnelly, Richard Mortier, and Antony Rowstron, Delay aware querying with Seaweed, in The VLDB Journal, vol. 16, no. 1, Springer Verlag, September 2007.



    • Dushyanth Narayanan, Eno Thereska, Austin Donnelly, Sameh Elnikety, and Antony Rowstron, Migrating enterprise storage to SSDs: analysis of tradeoffs, in Proceedings of EuroSys 2009, ACM, Nuremberg, Germany, March 2009.
    • Miguel Castro, Manuel Costa, Jean-Philippe Martin, Marcus Peinado, Periklis Akritidis, Austin Donnelly, Paul Barham, and Richard Black, Fast Byte-Granularity Software Fault Isolation, in ACM Symposium on Operating Systems Principles (SOSP), Association for Computing Machinery, Inc., October 2009.

      Bugs in kernel extensions remain one of the main causes of poor operating system reliability despite proposed techniques that isolate extensions in separate protection domains to contain faults. We believe that previous fault isolation techniques are not widely used because they cannot isolate existing kernel extensions with low overhead on standard hardware. This is a hard problem because these extensions communicate with the kernel using a complex interface and they communicate frequently. We present BGI (Byte-Granularity Isolation), a new software fault isolation technique that addresses this problem. BGI uses efficient byte-granularity memory protection to isolate kernel extensions in separate protection domains that share the same address space. BGI ensures type safety for kernel objects and it can detect common types of errors inside domains. Our results show that BGI is practical: it can isolate Windows drivers without requiring changes to the source code and it introduces a CPU overhead between 0 and 16%. BGI can also find bugs during driver testing. We found 28 new bugs in widely used Windows drivers.

    • Eno Thereska, Austin Donnelly, and Dushyanth Narayanan, Sierra: a power-proportional, distributed storage system, no. MSR-TR-2009-153, November 2009.

      We present the design, implementation, and evaluation of Sierra: a power-proportional, distributed storage system. I/O workloads in data centers show significant diurnal variation, with peak and trough periods. Sierra powers down storage servers during the troughs. The challenge is to ensure that data is available for reads and writes at all times, including power-down periods. Consistency and fault-tolerance of the data, as well as good performance, must also be maintained. Sierra achieves all these through a set of techniques including power-aware layout, predictive gear scheduling, and a replicated shortterm versioned store. Replaying live server traces from a large e-mail service (Hotmail) shows power savings of at least 23%, and analysis of load from a small enterprise shows that power savings of up to 60% are possible.


    • Hussam Abu-Libdeh, Paolo Costa, Antony Rowstron, Austin Donnelly, and Greg O'Shea, Symbiotic Routing in Future Data Centers, ACM SIGCOMM, August 2010.

      Building distributed applications that run in data centers is hard. The CamCube project explores the design of a ship- ping container sized data center with the goal of building an easier platform on which to build these applications. Cam- Cube replaces the traditional switch-based network with a 3D torus topology, with each server directly connected to six other servers. As in other proposals, e.g. DCell and BCube, multi-hop routing in CamCube requires servers to participate in packet forwarding. To date, as in existing data centers, these approaches have all provided a single routing protocol for the applications. In this paper we explore if allowing applications to im- plement their own routing services is advantageous, and if we can support it efficiently. This is based on the obser- vation that, due to the exibility offered by the CamCube API, many applications implemented their own routing pro- tocol in order to achieve specific application-level charac- teristics, such as trading off higher-latency for better path convergence. Using large-scale simulations we demonstrate the benefits and network-level impact of running multiple routing protocols. We demonstrate that applications are more efficient and do not generate additional control traffic overhead. This motivates us to design an extended routing service allowing easy implementation of application-specific routing protocols on CamCube. Finally, we demonstrate that the additional performance overhead incurred when us- ing the extended routing service on a prototype CamCube is very low.


    • Eno Thereska, Austin Donnelly, and Dushyanth Narayanan, Sierra: Practical Power-proportionality for Data Center Storage, Proceedings of EuroSys 2011, Salzburg, Austria, April 2011.

      Online services hosted in data centers show significant diurnal variation in load levels. Thus, there is significant potential for saving power by powering down excess servers during the troughs. However, while techniques like VM migration can consolidate computational load, storage state has always been the elephant in the room preventing this powering down. Migrating storage is not a practical way to consolidate I/O load. This paper presents Sierra, a power-proportional distributed storage subsystem for data centers. Sierra allows powering down of a large fraction of servers during troughs without migrating data and without imposing extra capacity requirements. It addresses the challenges of maintaining read and write availability, no performance degradation, consistency, and fault tolerance for general I/O workloads through a set of techniques including power-aware layout, a distributed virtual log, recovery and migration techniques, and predictive gear scheduling. Replaying live traces from a large, real service (Hotmail) on a cluster shows power savings of 23%. Savings of 40–50% are possible with more complex optimizations.


    • Paolo Costa, Austin Donnelly, Antony Rowstron, and Greg O'Shea, Camdoop: Exploiting In-network Aggregation for Big Data Applications, in 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI'12), USENIX, April 2012.

      Large companies like Facebook, Google, and Microsoft as well as a number of small and medium enterprises daily process massive amounts of data in batch jobs and in real time applications. This generates high network traffic, which is hard to support using traditional, oversubscribed, network infrastructures. To address this issue, several alternative network topologies have been proposed, aiming to increase the bandwidth available in enterprise clusters. We observe that in many of the commonly used workloads, data is aggregated during the process and the output size is a fraction of the input size. This motivated us to explore a different point in the design space. Instead of increasing the bandwidth, we focus on decreasing the traffic by pushing aggregation from the edge into the network. We built Camdoop, a MapReduce-like system running on CamCube, a cluster design that uses a direct-connect network topology with servers directly linked to other servers. Camdoop exploits the property that CamCube servers forward traffic, to perform in-network aggregation of data during the shuffle phase. Camdoop supports the same functions used in MapReduce and is compatible with existing MapReduce applications. We demonstrate that, in common cases, Camdoop significantly reduces the network traffic and provides high performance increase over a version of Camdoop running over a switch and against two production systems, Hadoop and Dryad/DryadLINQ.

    • Antony Rowstron, Dushyanth Narayanan, Austin Donnelly, Greg O'Shea, and Andrew Douglas, Nobody ever got fired for using Hadoop on a cluster, in 1st International Workshop on Hot Topics in Cloud Data Processing (HotCDP 2012), ACM, 10 April 2012.

      The norm for data analytics is now to run them on commodity clusters with MapReduce-like abstractions. One only needs to read the popular blogs to see the evidence of this. We believe that we could now say that "nobody ever got fired for using Hadoop on a cluster"! We completely agree that Hadoop on a cluster is the right solution for jobs where the input data is multi-terabyte or larger. However, in this position paper we ask if this is the right path for general purpose data analytics? Evidence suggests that many MapReduce-like jobs process relatively small input data sets (less than 14 GB). Memory has reached a GB/$ ratio such that it is now technically and financially feasible to have servers with 100s GB of DRAM. We therefore ask, should we be scaling by using single machines with very large memories rather than clusters? We conjecture that, in terms of hardware and programmer time, this may be a better option for the majority of data processing jobs.



    • Shobana Balakrishnan, Richard Black, Austin Donnelly, Paul England, Adam Glass, Dave Harper, Sergey Legtchenko, Aaron Ogus, Eric Peterson, and Antony Rowstron, Pelican: A building block for exascale cold data storage, 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI '14), 6 October 2014.

      A significant fraction of data stored in cloud storage is rarely accessed. This data is referred to as cold data; cost-effective storage for cold data has become a challenge for cloud providers. Pelican is a rack-scale hard disk based storage unit designed as the basic building block for exabyte scale storage for cold data. In Pelican, server, power, cooling and interconnect bandwidth resources are provisioned by design to support cold data workloads; this right-provisioning significantly reduces Pelican’s total cost of ownership compared to traditional disk-based storage.

      Resource right-provisioning in Pelican means only 8% of the drives can be concurrently spinning. This introduces complex resource management to be handled by the Pelican storage stack. Resource restrictions are expressed as constraints over the hard drives. The data layout and IO scheduling ensures that these constraints are not violated. We evaluate the performance of a prototype Pelican, and compare against a traditional resource overprovisioned storage rack using a cross-validated simulator. We show that compared to this over-provisioned storage rack Pelican performs well for cold workloads, providing high throughput with acceptable latency.


    • Sergey Legtchenko, Xiaozhou Li, Antony Rowstron, Austin Donnelly, and Richard Black, Flamingo: Enabling Evolvable HDD-based Near-Line Storage, in 14th USENIX Conference on File and Storage Technologies (FAST 16), USENIX Association, Santa Clara, CA, 22 February 2016.

      Cloud providers and companies running large-scale data centers offer near-line, cold, and archival data storage, which trade access latency and throughput performance for cost. These often require physical rack-scale storage designs, e.g. Facebook/Open Compute Project (OCP) Cold Storage or Pelican, which co-design the hardware, mechanics, power, cooling and software to minimize costs to support the desired workload. A consequence is that the rack resources are restricted, requiring a software stack that can operate within the provided resources. The co-design makes it hard to understand the end-to-end performance impact of relatively small physical design changes and, worse, the software stacks are brittle to these changes.

      Flamingo supports the design of near-line HDD-based storage racks for cloud services. It requires a physical rack design, a set of resource constraints, and some target performance characteristics. Using these Flamingo is able to automatically parameterize a generic storage stack to allow it to operate on the physical rack. It is also able to efficiently explore the performance impact of varying the rack resources. It incorporates key principles learned from the design and deployment of cold storage systems. We demonstrate that Flamingo can rapidly reduce the time taken to design custom racks to support near-line storage.