Selected PhD Projects – 2014

Microsoft Research PhD Scholarship: notification of awards

The following 22 projects have been selected for funding starting in the academic year 2014–2015. Four of the proposals relate to the Joint Initiative in Informatics with Edinburgh University and four of the proposals relate to the Joint Initiative with University College London. PhD supervisors are actively recruiting for these projects; candidate selection should be complete by June 2015.

Large Scale Diverse Learning for Structured Output Prediction

Supervisor: Pawan Kumar, Ecole Centrale Paris, INRIA Saclay
Microsoft Research supervisor: Sebastian Nowozin

Summary: Structured output prediction plays a central role in many domains of artificial intelligence. In order to greatly enhance its applicability, we plan to develop novel learning paradigms that provide accurate models from diverse data. The risk with diverse learning is that it often results in large non-convex problems, which makes the learner computationally inefficient and prone to bad local minimum solutions. In order to tackle the aforementioned risk, we will work in three directions. First, we will develop novel, mathematically sound techniques based on self-paced learning (SPL). Second, we will incorporate active learning with SPL such that a user can be employed to provide annotations for challenging samples. Third, we will develop large-scale parallel algorithms for solving the resulting optimization problem using dynamic dual decomposition. The infrastructure developed as part of this work will be tested on the challenging problem of semantic segmentation.

Computational Algorithms as Biological Regulatory Networks 

Supervisor: Attila Csikasz-Nagy, King's College London
Microsoft Research supervisor: Luca Cardelli

Summary: Evolution selected for biological network designs that are capable of fast and proper responses to inputs. Computing is also designed to handle tasks in a fast and precise way. The efficiency of biological systems is often copied when designing biologically inspired tools (evolutionary algorithms, machine learning, and so forth). The supervisor of this project, in collaboration with Luca Cardelli, investigated the similarities between a cell cycle regulatory switch and the Approximate Majority (AM) algorithm of distributed computing. The AM algorithm computes the majority of two finite populations and the cell cycle switch ensures that cells divide only after DNA replication is finished. The functions of the two switches differ but their dynamical behaviour is similar. The project goes further to investigate how biological regulatory systems can be converted to computational algorithms and how algorithms can provide us hints about non-clearly understood biological systems.

Leveraging Data Reuse for Efficient Ranker Evaluation in Information Retrieval  

Supervisor: Maarten De Rijke, University of Amsterdam
Microsoft Research supervisor: Filip Radlinski

Summary: Interleaved comparison methods enable the assessment of rankers using implicit feedback from actual users. The use of interleaved comparisons gives rise to a key challenge in ranker evaluation: how to efficiently determine the best ranker from a set using only pairwise comparisons. This challenge can be formalized as a K-armed dueling bandits problem. We aim to develop new K-armed dueling bandits algorithms that reduce the number of evaluations required. The idea is to create methods that use the results of each interleaved comparison to update estimates of the relative quality of all rankers under evaluation, not just of the two rankers that were interleaved. Doing so is possible if each interleaved comparison is conducted probabilistically, so that unbiased and consistent estimates of each ranker’s quality can be obtained. This allows us to extend interleaved comparison methods to multileaved comparisons, in which the ranking shown to the user is constructed from multiple rankers.

Sketching Algorithms for Massive Graphs and Matrices

Supervisor: Graham Cormode, University of Warwick
Microsoft Research supervisor: Milan Vojnovic

Summary: Increasingly, we are faced with larger and larger volumes of data from which to extract insights and intelligence. An important case surrounds data that can be represented as a graph or (adjacency) matrix. A promising approach is to look for ways to ``sketch'' such structures: to build a representation that is much more compact than the input, but which allows some function of interest on the original data to be approximated accurately using the sketch. Such sketches are well known and widely used for data that can be represented as a vector (such as to identify the most frequent elements, or to count the number of distinct items). The goal of this scholarship project is to develop new algorithms for sketching of massive graphs and matrices, and to demonstrate their usefulness via theoretical analysis and empirical evaluation.

Verifying Concurrent Higher-Order Programs

Supervisor: Matthew Hague, Royal Holloway UoL
Microsoft Research supervisor: Andrey Rybalchenko

Summary: The importance of formal verification of software is well recognised. Microsoft have been at the forefront of this success both through the seminal SLAM project (part of the Static Driver Verifier tool) and with the celebrated Z3 SMT solver. Higher-order programming features are increasingly supported by modern languages, such as F Sharp, Python, Scala, Haskell, OCaml and C++. Thus, this forms an increasingly essential topic for verification for which there have been several important recent advances. Furthermore, it is clear that concurrency will become the dominant programming paradigm, due to both the ubiquity of multi- and many-core machines and the increasingly distributed nature of computation, especially in the processing of ``Big Data''. The goal of this proposal is to develop the theoretical underpinnings and successful practical applications of higher-order verification to concurrent systems, resulting in new program models and tools applicable to real-world code.

GeoGraph: Efficient Geographically Distributed Graph Infrastructure

Supervisor: Fernando Pedone, University of Lugano
Microsoft Research supervisor: Flavio Junqueira

Summary: Many current online services build on graph data structures (for example, social networks, collaborative applications, recommendation systems). Services in this category share a number of common characteristics: they typically serve a large user base, possibly geographically distributed; the underlying graph structure has particular properties; users can tolerate certain anomalies but expect some guarantees from the system (for example, preserving causal dependencies among requests); most requests are either small graph updates or relatively large queries. The goal of this project is two-fold: First, we aim to propose consistency criteria well-adapted to graph-dependent online services. In this sense, consistency must account for the typical operations performed on graphs by online services. Second, we intend to design, implement, and experimentally assess an infrastructure that implements this isolation level (or levels).

Computation During Development: Characterising the Molecular Programs that Underlie Pluripotency and Differentiation in Embryoni

Supervisor: Brian Hendrich, University of Cambridge
Microsoft Research supervisor: Stephen Emmott

Summary: Embryonic stem cells are a unique type of cell derived from early mammalian embryos, which possess the ability to self-renew and to differentiate into all somatic lineages—a characteristic known as pluripotency. This potential makes them an attractive prospect for regenerative medicine, as well as a useful model for understanding mammalian development. A challenge remains to uncover the processes that underlie cell fate determination and how to translate this information into a predictive explanation of behaviour. Such challenges necessitate new approaches, and in particular, novel computational methods to uncover the information processing that is performed throughout cellular decision-making. With this PhD project we propose to combine both state-of-the-art experimental and computational methods to yield new insight into a fundamental biological question, and to impact significantly how research is conducted within the field of stem cell biology.

The Generality and Mechanism of Bet-Hedging in Bacteria

Supervisor: James Locke, Sainsbury Laboratory
Microsoft Research supervisor: Andrew Phillips

Summary: Gene circuits exhibit fluctuations (‘noise’) in the level of key regulatory components. Increasingly, noise appears to play functional roles. For example, noise could enable a subpopulation of cells to enter a transient antibiotic-resistant state, enhancing their survival. By ensuring that cells do not all exist in the same transcriptional state, the colony can ‘bet hedge’ against future environmental changes. We will use single-cell time-lapse microscopy to examine the generality of bet-hedging in B. subtilis. After screening for pathways that show variable gene expression, we will use a combination of synthetic biology and computational modelling to discover the mechanisms that allow cells to probabilistically enter these states. The work will lead to an understanding of how and why alternative transcriptional states are generated.

Statistical Models and Methods for Privacy Technologies

Supervisor: Claudia Diaz, KU Leuven
Microsoft Research supervisor: Markulf Kohlweiss

Summary: Social interactions are increasingly conducted online. This introduces privacy risks as the analysis of user communication patterns might reveal sensitive private information. To address this problem, a number of distributed architectures for privacy-friendly communications have been proposed. While content protection can be achieved through the use of cryptographic mechanisms, protecting these systems with respect to traffic analysis attacks remains largely an open problem. In this project, we will develop behavioural models that express how users interact with each other, and methods for extracting statistics on networked data in a privacy friendly way. We will also develop advanced inference techniques for evaluating the security properties of distributed communications networks with respect to traffic analysis attacks. Based on the results of this analysis we will propose traffic analysis resistant designs that take into account user communication behaviour.

Rethinking Resource Allocation in Data Centres: Optimization, Incentives, and Beyond

Supervisor: Michael Schapira, Hebrew University of Jerusalem
Microsoft Research supervisor: Hitesh Ballani/Ian Kay

Summary: Cloud services, ranging from search to email and social networks, are driving the creation of larger and larger data centres. We advocate a holistic, application-centric approach to resource allocation in data centres: jointly optimizing the allocation of computational resources (for example, CPU, memory) and network resources (for example, bandwidth) by leveraging predictability in data centres. We argue that this approach holds great promise both for rendering data centres more profitable (via more efficient use of the data centres resources) and for improving data centre-provided services and perceived fairness from the user's perspective. We aim to draw ideas from (CS and economic) theory devise practical schemes for resource allocation in data centres with provable guarantees.

Why Do People Communicate?

Supervisor: Constantine Sandis, Oxford Brookes University
Microsoft Research supervisor: Richard Harper

Summary: This project proposes a philosophical anthropology of communication accompanied by an exploration of whether, and if so how, such an understanding can offer insights for innovation in the area of communications engineering and HCI. It does so by appealing to a picture of human nature which understands communicative minds in terms of shared intentional behaviour, rather than the other way round.

Reasoning about Side Channels in Cryptographic Protocols

Supervisor: Boris Köpf, IMDEA Software Institute
Microsoft Research supervisor: Cedric Fournet

Summary: Side-channel attacks break cryptosystems by exploiting signals that are unwittingly emitted by their implementations. Many defence mechanisms rely on the context in which a cryptographic primitive is used, that is, the protocol. In the course of this project, we will devise techniques that enable reasoning about side-channel leakage in cryptographic protocols. The promise of our approach is to achieve high degrees of security and performance at the same time. To this end, we will tackle two open challenges: first, how to do compositional reasoning about leakage and its aggregation; second, how to embed low-level binary analysis into this compositional context.

Understanding Flows of Personal Information in a Connected World

Supervisor: Alastair Beresford, University of Cambridge
Microsoft Research supervisor: Natasa Milic-Frayling

Summary: Applications today consume, synchronise and share personal information between a multitude of computers, often owned and controlled by different entities. Therefore personal information, once created, travels across a myriad of devices and machines, often geographically distributed across continents. Given this, the central research question we wish to address in this project is: when carrying out a task, how does the user know what personal data is involved, how it is shared, and what the privacy trade-offs are in competing products? Our approach combines together technical analysis, to understand where personal information flows today, and lab experiments, to understand what humans believe is going on. Our aim is to improve our understanding of the flow of personal information in modern distributed applications, and to develop methods of sharing our understanding with privacy experts, developers and application users.

Advancing Random Forests and Other Ensembles

Supervisor: Nando De Freitas, University of Oxford
Microsoft Research supervisor: Antonio Criminisi

Summary: This work will advance the theory and practice of one of the most popular classification algorithms in industry: random forests. These algorithms are behind many innovations including Kinect and the popular news app Zite (which powers CNN trends).

Joint Initiative with University College London

3D World: Creation, Abstraction, and Application of Massive Crowd-sourced Collections of Heterogeneous 3D Models

Supervisor: Niloy Mitra
Microsoft Research supervisor: Shahram Izadi/Pushmeet Kohli

Summary: Fast, portable, and cheap sensors for 3D acquisition (for example, Microsoft Kinect, other RGBD scanners) are becoming commonplace in the technology marketplace. This will invariably lead to the creation of massive repositories of unstructured acquisitions (for example, depth scans of indoor environments) collected by users such as Kinect@home, similar to what we witnessed for images (for example, Flickr, Picasa, and so forth).

Private Computation as a Service

Supervisor: George Danezis
Microsoft Research supervisor: Cedric Fournet

Summary: The objective of this project is to produce tools and services that can be used by software engineers to build and deploy systems using private computation techniques without any need for them to understand the underlying cryptology, mathematics or theory. This represents a serious research problem on multiple fronts: Special languages or subsets of high-level languages are necessary to express computations to be performed on private data as simple programs. Those have to be automatically transformed into the form necessary to be run privately; Secondly, the architecture required to support private computations will be different from traditional cloud storage and compute nodes, and has to be designed with security in mind; Finally, there is a need to provide systems’ support for a number of private computation techniques that require both high level code and / or private data to securely migrate across machines and trust boundaries.

Understanding the Moving Quadruped: Computer Vision to Advance Science, Medicine, and Veterinary Care

Supervisor: Gabriel Brostow
Microsoft Research supervisor: Jamie Shotton

Summary: Driven by specific scientific questions, we will build new algorithms and devices to track the body and feet of mice, horses, and other quadrupeds that fall between these two extremes. On the biological side, we aim to investigate how body morphology and the ultimate constraints of stability, energetic cost, and dexterity shape animal gait. On the clinical side, this studentship will make essential contributions to applying the developed techniques across species with the potential for large welfare and economic benefits. This project offers a rare opportunity to bring together top experts in vision and biology just when their respective strengths are ready to be really challenged by the broader scientific community.

Ad-hoc Cross-Device Interactions Facilitating Small-Group Collaborative Explorations and Curation of Historic Documents

Supervisor: Nicolai Marquardt
Microsoft Research supervisor: Abigail Sellen

Summary: Our proposed project explores the design of ad-hoc cross-device interactions facilitating collaborative small-group activities. The case study for contextualising our ideas is curating digital content for an ongoing cultural history project in Cambridge. This enables us to link in with a local community project that is based on real needs and provides new opportunities for technology interventions. The process of curating documents in new ways can be a demanding task, requiring to collect, review, and combine a large collection of raw material. Our goal is to design novel interaction techniques across multiple tablet computers that let people use their portable devices in concert, for new ways for sharing, extending, transferring, or overlaying content across these devices. Our project will investigate current curating practices, develop a technical framework for cross-tablet interactions, and design interaction techniques that fit into the current practices of the community project.

Joint Initiative in Informatics with University of Edinburgh

Vision as Inverse Graphics

Supervisor: Christopher Williams
Microsoft Research supervisor: Pushmeet Kohli

Summary: The goal of scene understanding is to infer the objects present in a scene, their positions and poses, the illuminant, and so forth These are the *latent variables* which must be inferred in order to understand the scene. We will develop a stochastic scene generator, and render these scenes to produce images; we will then train recognition networks to infer the relevant latent variables, and refine the predictions by probabilistic programming on the generative model. The great advantage of using a scene generator and renderer is that large quantities of image data with the associated latent variables can be obtained for training the recognition networks.

TypeScript: The Next Generation

Supervisor: Philip Wadler
Microsoft Research supervisor: Andy Gordon

Summary: Our short-term goal is to build a tool, TypeScript: The Next Generation (TNG), that generates wrapper code from TypeScript ``import'' declarations to detect and pinpoints type errors. A wrapper will accept any JavaScript value as input, and either raise an error or return a value guaranteed to satisfy the invariant associated with the corresponding type. Our hypothesis is that TypeScript TNG will aid debugging and increase reliability of TypeScript and JavaScript code. The long-term goal is to extend the foundations of the blame calculus to support a wide-spectrum type system, ranging from dynamic types (as in JavaScript or Racket) through Hindley-Milner types (as in F# or Haskell) to dependent types (as in F* or Coq). Our hypothesis is that a wide-spectrum type system will increase the utility of dependent types, by allowing dynamic checks to be used as a fall back when static validation is problematic.

Formal Language Support for Ecological Modelling

Supervisor: Jane Hillston
Microsoft Research supervisor: Matthew Smith

Summary: Advances in computational power, and our ability to collect data on an unprecedented scale, have led to a fertile period for computational modelling. In this project, we consider ecological modelling, seeking to explore the benefits that may be gained through the use of formal modelling languages, with automated translation into mathematical and computational representations. We are encouraged by previous success in this direction in systems biology, but there are significant new challenges in this domain. We will focus on representation of space and how the spatial relationship between entities can affect patterns of interaction and consequently, ecological processes. Developing over a number of phases, motivated by increasingly rich ecological scenarios, the outcome will be a spatial population process algebra; mappings from this modelling language to appropriate computational modelling engines; and complementary static analysis techniques.

SMT for Nonlinear Constraints with Application to Computational Biology

Supervisor: Paul Jackson
Microsoft Research supervisor: Christoph Wintersteiger

Summary: Given their immense success in powering modern program verification, Satisfiability Modulo Theories (SMT) solvers have begun to gain the attention of practitioners in other areas of science and engineering. Common to many new disciplines applying SMT (among them computational biology) is a pressing need for reasoning with nonlinear constraints over continuous variables. Unfortunately, this is an area where SMT solvers are still very weak. The proposed research shall address this shortcoming by developing nonlinear reasoning techniques which radically enhance the utility of SMT solvers for computational biology.