Selected PhD Projects – 2013

Microsoft Research PhD Scholarship: notification of awards

The following 20 projects have been selected for funding starting in the academic year 2013–2014. Four of the proposals relate to the Joint Initiative in Informatics with Edinburgh University and four of the proposals relate to the new Joint Initiative with University College London. PhD scholar awards will be announced in June 2014.

Expert Visual Classification with Thousands of Categories

Supervisor: Matthew Brown, University of Bath
Microsoft Research supervisor: Antonio Criminisi

Research summary: Most work in visual classification is aimed at emulating, rather than augmenting, human recognition performance. This is partially attributable to the current paradigm in object recognition, which involves non-expert human labelling of a large number of training images, followed by supervised classification. This approach works well for small numbers of visually sparse classes, but fails to replicate the abilities of experts in more technical domains, such as medical image understanding. Furthermore, state-of-the-art techniques fail to scale effectively to large numbers of visual categories, which are commonplace in medical or biological taxonomies. In this project we will develop general techniques for scalable, expert visual classifiers that augment the capabilities of a typical person, aiming to furnish them with the visual insight of a botanist, architect or doctor.

Investigation of a Radically New Energy Technology Based Upon Programmable Artificial Photosynthesis  

Supervisor: Alex Chin, University of Cambridge
Microsoft Research supervisor: Stephen Emmott 

Research summary: There is an urgent need to find alternative (non-carbon-based) sources of energy to both meet a threefold increase in energy demand this century, and to mitigate climate change effects. As a consequence, the concept of Artificial Photosynthesis is rapidly gaining momentum; the prevailing research focus being on the development of solid-state materials able to split water and extract the resulting hydrogen as energy. We propose a radically new, conceptually, form of Artificial Photosynthesis based on photosynthesis as programmable computation, implementable through Living Software on a ‘metabolic material’, to harvest light and convert it to directly usable energy. The project is highly innovative, and is the first ever of its kind, focused at the intersection of twenty-first century computing (programming biology) and a key twenty-first century societal challenge—solving the energy crisis.

Compositional Verification of Scalable Joins by Protocol-Based Refinement   

Supervisor: Derek Dreyer, Max Planck Institute for Software Systems (MPI-SWS)
Microsoft Research supervisor: Nick Benton 

Research summary: Fournet and Gonthier's "join calculus" supports a powerful combination of message-passing concurrency and declarative atomicity, thus enabling the convenient encoding of higher-level synchronization abstractions. Turon and Russo have recently developed an efficient implementation of the C# library for joins that scales well to multicore architectures. In this project, we aim to verify a realistic software stack for scalable joins, including both Turon and Russo's implementation and a representative suite of joins-based client code that depends on it. Our hypothesis is that a feasible and effective way to achieve this goal is by using contextual refinement to isolate the verification of the implementation from the verification of the client code. Toward this end, we propose to use recent advances in the technology of "Kripke logical relations" in order to formalize the "protocols" that govern the use of private state in both implementation and client code.

3D Smart Memory for Scale-Out Servers   

Supervisor: Babak Falsafi, EPFL
Microsoft Research supervisor: Dushyanth Narayanan 

Research summary: With the end of Dennard scaling on the horizon and the emergence of Big Data and the growing demands for processing, communication and storing massive data, 3D-stacked IC promises to be viable approach to scalability in servers. While 3D-stacked RAM is already available as a product, there a number of fundamental challenges related to power and thermal density, precluding the use of stacking of server grade logic dies on top of DRAM. Moreover, while the recently proposed scale-out processor architectures based on efficient mobile cores dramatically improve performance density and throughput in servers (e.g., by 10x), they do not alleviate the power density and thermal problems in stacking as compared to conventional server dies because they operate at a commensurate power budget. In this proposal, we will investigate logic die organizations using extremely low-power cores that would readily be stackable on existing DRAM stacks with available thermal and power density budgets.

Intuitive and Efficient Design of Enclosures for .NET Gadgeteer    

Supervisor: Manfred Lau, University of Lancaster
Microsoft Research supervisor: Nicolas Villar

Research summary: The Gadgeteer framework provides a platform for designing and fabricating electronic devices with a set of small electronic components. The current approach for modelling an enclosure (or physical case) for such a device is with Solidworks, and this process typically takes hours. We propose to develop a set of easy-to-use 3D modelling and digital fabrication tools that novices can use to design an enclosure in the order of minutes. We propose to explore various methods to build these tools, including: a 3D hand gestures and augmented reality interface, a 3D component-centric interface, and grammar-based representations and algorithms. We will take a user-oriented approach to evaluate our interactive tools. 

Natural User Interfaces for the Developing World     

Supervisor: Gary Marsden, University of Cape Town
Microsoft Research supervisor: Richard Harper

Research summary: The goal of this research is to investigate the design and development of interfaces that are ‘natural’ for those living in the developing world.

Natural user interfaces (NUI) hold out the promise of revolutionising the experience and use of computers. To date, the bulk of research in this area has explored natural interfaces for the office and home of the twenty-first century in the developed world. Much less research attention has been given to creating such interfaces for the developing world. This is because innovation in this setting requires a much broader approach to create something that is both practical to design and build, and which deals with the particular constraints of the developing world.

Understanding the Dynamics of Embryonic Stem Cells Differentiation: A Combined Experimental and Modeling Approach     

Supervisor: Graziano Martello, University of Cambridge
Microsoft Research supervisor: Hillel Kugler

Research summary: Pluripotency is a key feature of embryonic stem (ES) cells, and is defined as the ability to form all cell lineages present in the adult body (Smith, 2006). Pluripotent ES cells are thus potentially a great biomedical resource and, at the same time, a unique and intellectually fascinating cell type. For these reasons ES cells have been extensively investigated in the past three decades, leading to a comprehensive understanding of the signals maintaining pluripotency and of the transcription factors that constitute the gene regulatory network (GRN) of ES cells. Surprisingly, very little is known about how ES cells exit the pluripotent state and start to differentiate. Our aim is to understand the molecular events associated with this transition and its dynamics.

Exploiting Mobile Sensing and Geo-social information in Mobile Recommendation Systems      

Supervisor: Cecilia Mascolo, University of Cambridge
Microsoft Research supervisor: Natasa Milic-Frayling

Research summary: In this proposal we extend location based recommendation by bridging the gap between mobile sensing, location- based and on-line systems. We aim to address the problem of collecting user contextual information in real time from sensor equipped smartphones and integrate that with social and geographical information, also gathered through the interaction of phones with the mobile web, in order to offer optimal content on-line and geographic suggestions to users. The technical challenges of providing high quality recommendations are:

  1. The ability to accurately detect the user context in presence of energy limitations for phone sensing
  2. The integration of the most effective spatio-temporal data mining features in an environment of sparse, very large datasets
  3. The urban sensing of events in the city by observing large streams of online data for the detection of ongoing events
  4. The offering of online recommendations using real time back end computation of current user input

Future File Systems: Mechanized Specification, Validation, Implementation and Verification of File Systems       

Supervisor: Tom Ridge, University of Leicester
Microsoft Research supervisor: Andrew Kennedy

Research summary: We aim to produce a formal specification of file system behaviour, and a verified file system implementation. The specification will be validated against existing real-world implementations, to ensure that it is reasonable. The implementation will be verified correct according to the specification, and the proof mechanized in the HOL4 theorem prover. This would result in a file system implementation with exceptionally strong reliability guarantees.

The specification and implementation have many applications. For example, the specification could be used as a foundation for the further verification of applications that use a file system, such as databases and persistent message queues. The implementation could be used as a key component in systems with high reliability requirements, such as NASA space exploration missions. A verified file system would also be a key component of a fully-fledged verified operating system, such as those currently being developed at Microsoft and elsewhere.

All-Pay Auctions in the Real World

Supervisor: Jeffrey Rosenschein, Hebrew University
Microsoft Research supervisor: Yoram Bachrach

Research summary: In the past decade, the field of algorithmic game theory (AGT) has grown considerably, both with respect to advances in research, as well as a significant increase in its real-world application. The use of AGT includes the utilization of game-theoretic concepts (such as maximizing auction revenue and making voting less easily manipulated) in various Internet scenarios. One of the major areas that has developed significantly is the involvement of a large number of individuals in working, as freelancers, on specific problems or tasks, i.e., crowdsourcing. Interest in crowdsourcing has rekindled research into all-pay auctions, which can help in modeling these interactions. We propose exploring various novel aspects of all-pay auctions, such as the possibility of mergers and collusions, and the use of all-pay auctions in real-world mechanisms.

First-Order Satisfiability Modulo Theories

Supervisor: Philipp Ruemmer, Uppsala University
Microsoft Research supervisor: Christophe Wintersteiger

Research summary: SAT and SMT solvers form the backbone of many of today's verification systems, responsible for discharging verification conditions that encode correctness properties of hardware or software designs. SMT solvers are essentially ground decision procedures, in the sense that quantifier treatment is implemented by means of instantiation on top of a procedure for quantifier-free reasoning. For verification and other applications, efficient handling of quantifiers has been identified as one of the main challenges in the development of solvers.

This proposal is about lifting the SMT paradigm to the first-order level, by including quantifiers as first-class citizens in solvers.

Learning to Index

Supervisor: Ole Winther, Technical University of Denmark
Microsoft Research supervisor: Ulrich Paquet

Research summary: Algorithmic scalability is important for “Big Data” Machine Learning. However, an oft-neglected axis is that of the scalability of decision-making. For matrix factorization-based recommender systems, it is a linear in the number of objects. Under a strict time budget with millions of objects, sublinear retrieval algorithms are required for true scalability. Fast tree structures exist for O(log n) retrieval when the triangle inequality holds between objects, or when the objects are inherently indexable. However, current algorithms are not principally designed to operate within a sublinear retrieval criterion.

Models can be specified such that it is forced to learn a good index over the data: This proposal investigates how a “matrix factorization” recommender can learn constrained parameters that are amenable to O(log n) indexing. This setting of “learning to index” is of cardinal importance to large-scale online retrieval, which is hampered by the lack of indexable solutions.

Joint Initiative with University College London

Weakness as a Virtue

Supervisor: Jade Alglave
Microsoft Research supervisor: Byron Cook

Research summary: The number of interleavings that a concurrent program can have is typically identified as the root cause of the difficulty of automatic analysis of concurrent software. Weak memory, as implemented by modern multiprocessors such as Intel x86, IBM Power and ARM, is generally believed to make this problem even harder. On the contrary, I believe that weakness can be a virtue.

The hypothesis of this proposal is that by embracing rather than fleeing from weak memory, we can obtain efficient verification techniques. We will experiment with this hypothesis by rejecting total orders when modelling the executions of concurrent programs. Following the partial order semantics tradition, we will model executions with partial orders. I believe that these models have a great potential for practical verification, which has not been fully realised yet.

Experimental and Computational Studies on the Human Antigen-Specific T Cell Repertoire

Supervisor: Benny Chain
Microsoft Research supervisor: Neil Dalchau

Research summary: The adaptive immune response is defined by clonal expansion of individual lymphocytes in response to antigen. However, the number of different clones responding to any given antigen, and the relationship between clonal frequency and antigen specificity and affinity remain unknown. In this project, we use high throughput sequencing to determine the repertoire of T cells responding to specific antigens. We will develop machine learning algorithms which will be used to classify these T cell receptors, and distinguish between antigen specific and antigen non-specific receptors. The project will also work with Microsoft Research to integrate the frequency data obtained with deterministic models of T cell activation, thus contributing to the overall aim of building a multi-scale model of the evolution of an antigen specific immune response.

3D Reconstruction of Live Scenes Using Multiple Heterogeneous Mobile Depth Cameras

Supervisor: Jan Kautz
Microsoft Research supervisor: Shahram Izadi

Research summary: Imagine an event like a concert, a play or even a formal ball, that is taking place at a distant location. Given its location, you cannot attend in person but you would still like to view and explore the event as if you were attending. Thus, you should be able to roam around freely and explore the event live from any viewpoint. Recent advances in camera technology, in particular depth-sensing (Kinect) and stereo-capable cameras (e.g., some mobile phones) from which depth can be derived, can aid in this context. We propose to use a heterogeneous network of Kinect cameras and stereo-capable mobile devices, where the Kinect cameras would be statically mounted and the 3D mobile devices would be worn / carried by attendees of the event. This proposed setup allows us to collect rich colour and depth data from many viewpoints. Given this input, we believe it possible to reconstruct enough information to facilitate inspecting the scene from any viewpoint at any time instance.

Probabilistic Databases of Multimodal and Universal Schema

Supervisor: Sebastian Riedel
Microsoft Research supervisor: Thore Graepel

Research summary: Databases can readily return answers explicitly stored in their tables, but it is difficult to query the implicit information captured in the database. To overcome this problem, we propose to develop probabilistic databases with relations corresponding to heterogeneous and multimodal content (number of Starbucks per km2), and relations corresponding to natural language surface patterns (“is gentrified”) appearing in text. The databases will automatically learn correlations between the multimodal and the surface pattern relations, and hence induce (some notion of) the meaning of surface patterns, grounded both with respect to other surface patterns, and to the databases’ multi-modal content. Querying implicit content then amounts to using the surface pattern relations in a query. Our approach to realize such databases will be based on latent feature models and matrix completion methods, and will require innovation in the intersection of IE, Semantics, Machine Learning and Databases. 

Joint Initiative in Informatics with Edinburgh University

Bayesian Probabilistic Programming for Security

Supervisor: David Aspinall
Microsoft Research supervisor: Andy Gordon

Research summary: Our hypothesis is that a probabilistic programming language will enable us to formulate computer security problems as Bayesian inference problems and to solve them using state-of-the-art inference algorithms. We want to investigate this with a PhD project using the language Fun invented at Microsoft Research. The project will code existing and new security analyses based on a range of Bayesian methods, such as those used from evaluation/attack of systems, estimating trustworthiness of network participants, measuring information hiding, anomaly detection, and forensics.

Provenance for Configuration Language Security

Supervisor: James Cheney
Microsoft Research supervisor: Dimitrios Vytiniotis

Research summary: Declarative, high-level configuration languages (e.g., LCFG, Puppet, Chef) are widely used in industry to configure large system installations. Configurations are often composed from distributed source files managed by many different users within different system and organisational boundaries. Users may make changes whose consequences are not easy to understand, and such systems also currently lack mature security access controls; the few currently available techniques have idiosyncratic behaviour and offer no formal guarantees. In the worst case, misconfiguration can lead to costly system failures; because of the complexity of the configuration build, it is difficult to recover from failures, trace the source of the error or identify the responsible party. In this project, we will explore the application of provenance techniques (originally developed in the context of databases) to establishing well-founded and effective techniques for security and audit for configuration languages. 

Performance Portability for Large-Scale Heterogeneous Systems

Supervisor: Christophe Dubach
Microsoft Research supervisor: Ant Rowstron

Research summary: Computing systems have become increasingly complex and hard to program. This is especially true for data centres since these systems have become more heterogeneous with the recent availability of GPUs. As a result, achieving high performance for such large-scale systems is an extremely challenging task. This problem is further exacerbated with each new hardware generation, which means software written and tuned for today's systems will need to be adapted frequently to keep pace with ever changing hardware.

This proposal is an attempt to tackle the inherent complexity of computing systems by: (1) abstracting away the details of these systems using a hierarchical hardware description, (2) providing a higher-level programming model that does not expose any specific hardware features, and (3) automatically tuning the software to the hardware using statistical techniques. The key idea is that software should be written only once and automatically tune itself for the underlying system.

Solving the Problem of Cascading Costs: Better Approximate Bayesian Inference for Data Pipelines

Supervisor: Iain Murray
Microsoft Research supervisor: John Winn

Research summary: Bayesian statistical methods give state-of-the-art performance in many machine learning applications. However, full joint Bayesian inference is rarely used in large-scale applications with several stages of data processing. Early stages of a pipeline are often seen as "pre-processing", processes outside of a statistical model, and model criticism is usually also a separate manual stage. We aim to provide tools that allow more steps of processing to be routinely part of probabilistic analyses. We hypothesise that new Monte Carlo fitting methods will give practical tools to achieve this aim, resulting in more accurate and trustworthy results for a variety of data-processing tasks involving pipelines. Successful proof-of-concepts will be developed into general tools, where possible as extensions of Microsoft Research’s Infer.NET project.