|
2006 Scholars
Some of the students are sponsored together with the
Irish Research Council for Science, Engineering and Technology.
Ana
Costa e Silva
University of Edinburgh, United Kingdom
Supervisor: Dr David
Robertson
Microsoft Research Supervisor: Prof. Christopher Bishop
Research title:
Extraction of Tabular Information from Unstructured Documents in a
Peer to Peer System
Research summary: The principal aim
of this project is to study how tabular data from traditional
(loosely structured) sources may be extracted and integrated using
methods of multi-agent coordination (specifically the LCC language
and associated methods being developed in the OpenKnowledge
project); ontology representation and (where applicable) machine
learning. Similar existing methods are either so general that they
cannot guarantee the accuracy of the extracted information without
extensive user intervention, or so context-dependent that they do
not operate outside their very specific domains. They are also
focused on (inevitably somewhat unreliable) extraction from specific
local sources which means that the integration issues when one has
to combine information obtained from different sources in different
contexts (and perhaps by different means) is not addressed. Using
LCC we shall study the broader picture required for integration and
address the issues of uncertainty and ontology mapping that emerge.
We speculate that by limiting our attention to a specific form of
data structuring (tabular) we will be able to perform better
integration without becoming domain specific.
António
José dos Reis Morgado
University of
Southampton, United Kingdom
Supervisor: Dr Joao
Marques-Silva
Microsoft Research Supervisor: Dr Lucas Bordeaux
Research title:
Haplotype inference with boolean satisfiability
Research summary:
Mutation in DNA is the principal cause for differences among human
beings, and Single Nucleotide Polymorphisms (SNPs) are the most
common mutations. Hence, a fundamental task is to complete a map of
haplotypes (which identify SNPs) in the human population. Associated
with this effort, a key computational problem is the inference of
haplotype data from genotype data, since in practice genotype data
rather than haplotype data is usually obtained. Different haplotype
inference approaches have been proposed, two of which are the most
promising in terms of accuracy: the utilization of statistical
methods and the utilization of the pure parsimony criterion. Recent
work has shown that a SAT-based approach is by far the most
efficient approach for the problem of haplotype inference by pure
parsimony (HIPP), being several orders of magnitude faster than
existing integer linear programming and branch and bound solutions.
This work proposes to develop SAT-based haplotype inference in
several strategic directions. The objective is to make SAT-based
approaches the de facto solution to the haplotype inference problem.
Arie
Middelkoop
Utrecht University, Netherlands
Supervisor: Dr Atze Dijkstra
Research title: Coping with complexity in compiler construction
Research summary: As part
of the development of a series of compilers (jointly called the EHC)
for an extended subset of the programming language Haskell we have
designed the Ruler language. From Ruler programs we can both
generate the type rules involved (in LaTeX) and the attribute
grammars that describe the corresponding implementation. In this way
we keep documentation and implementation synchronized. One of the
nice aspects of the Ruler language is that we can present the final
product in a sequence of steps. In the generated output differences
are clearly indicated, which makes the study of these there type
rules much easier. Having such a unified description opens a whole
new route for compiler construction, which we propose to follow. By
extending Ruler to a full programming language - e.g. by
incorporating global program analyses, and the possibilities for
specifying transformations of Ruler programs explicitly - we will be
able to capture many common compiler idioms in a uniform way; a
typical example of such an idiom can be found in the way Hindley-Milner
type inference uses environments and type variables to solve the set
of constraints specified by the type rules. Once we have generated
attribute grammars we can of course apply all the traditional
scheduling algorithms that enable us to compute the attributes in
way that is both time and space efficient. Furthermore we expect our
compilers to become much easier to adapt: we may switch between
different constraint solving strategies, error-correction
strategies, or instrument the compiler with extra computations that
will enable efficient incremental evaluation. Finally we may use
Ruler programs as a starting point for proving properties about the
described type system.
Aryeh
Rowe
Royal Holloway, University of London, United Kingdom
Supervisor: Dr Jason Crampton
Microsoft Research Supervisor: Dr Andy Gordon
Research title: Unified scalable access control
Research summary: Access
control in open distributed computer systems is a challenging
problem, which manifests itself at very different scales. At one end
of the scale we need to control access by mobile code to a host
machine, while at the other we need to provide different levels of
access for millions of users of web services. Research in the last
ten years has developed a range of solutions to this problem. Each
of these solutions is devised for the problem at one particular
scale. We believe that a unified model that provides a generic
access control solution at all scales is required. Such a model will
inevitably be informed by existing approaches, such as matching
attributes of the entity requiring access to the roles defined
within the environment containing the protected resource. However,
we also believe that is necessary to develop a new model for
authorization, based on formalisms that have been specifically
developed for distributed computation such as the pi calculus or I/O
automata.
Christian Steinruecken
University of Cambridge, United Kingdom
Supervisor: Prof. David MacKay
Microsoft Research Supervisor: Dr John Winn
Research title: Learning to recognise hierarchies of objects and
scenes
Research summary: Accurate
recognition of objects in images is critical to a very large range
of applications, from the self-drive car to aids for the blind. In
order to recognise very large numbers of object classes, it is
necessary to organise these classes in an appropriate hierarchy or
hierarchies, to exploit similarities and relationships between the
classes. The proposed research will investigate various
organisational hierarchies for object classes and scenes, and
demonstrate their advantages both for accurate recognition at
appropriate levels of specificity, and for rapid learning of new
classes. The project will culminate with the development of a large
scale object recogniser that exploits the hierarchical class
structure.
Christoph
Rhemann
Vienna University of Technology, Austria
Supervisor: Dr Margrit Gelautz
Research title: New approaches to video matting: Eye-tracking and
parallelization
Research summary: In this
PhD project, we want to perform cutting-edge research in video
matting by developing new and efficient algorithms to quickly
extract high-quality video objects from natural image sequences. We
will develop a new level of advanced user interaction techniques
based on eye-tracking technology and parallelization. This will
allow us to design interactive matting approaches with iterative
improvement of the matte by switching between user interaction and
computing intermediate results at high processing rates.
 Clíona
Roche
University College Dublin, Ireland
Supervisor: Dr Gianluca Pollastri
Research title: High-throughput comparative modelling of protein
structure by machine learning
Research summary: This
proposal aims at: designing and testing new predictive algorithms
for protein structural features that incorporate template
information from the PDB, when this is available; designing
algorithms for comparative modelling of protein structures relying
on template-based predictions; benchmarking on a large scale the
quality of predictions, and developing an index of reliability akin
to the resolution measure provided with X-ray crystallography;
producing comparatively modelled full 3D protein structures on a
multi-genomic scale; making these predictions publicly available as
large-scale databases. These resources may become a fundamental
source of information for scientists trying to elucidate the nature
of bio-molecular processes, and may allow the development of new
bioinformatics methods, e.g. protein similarity search tools based
on predicted structures.
 Conor
Cafferkey
Dublin City
University, Ireland
Supervisor: Prof. Josef van Genabith
Research title: Improving Machine Learning- and Treebank-Based
Probabilistic Parsing through Wide-Coverage Lexical Resources
Research summary: The aim
of this research is to explore a new methodology to improve
probabilistic parser output based on combining two largely
independent research strands: machine learning-based probabilistic
parsing on the one hand and existing extensive lexical resources on
the other.
Daniel
Cederman
Chalmers University of
Technology, Sweden
Supervisor: Prof. Philippas Tsigas
Research title: Non-blocking synchronization for multi-core
processors
Research summary: The aim
of this thesis is to investigate the impact of properties of
multi-core processors in the design and performance of non-blocking
data synchronization data structures. One specific property that
these systems have is the fact that processors are placed on the
chip and therefore have direct on-chip communications paths without
the need to go out to external buses. Multithreading has mainly been
something for server software and for heavy computational software
designed to run on supercomputers. Multithreading for desktop
applications though has not been as common. When the plethora of
desktop applications have to adapt to multiprocessing and
multithreading it is more important than ever to provide simple
interfaces to these algorithms. Therefore this is an important goal
for the thesis to achieve.
Guillem
Rull Fort
Technical
University of Catalonia (UPC), Spain
Supervisor: Dr Ernest Teniente
Research title: Validation of mappings between data models
Research summary: The main
goal of this research is to propose a method for testing whether a
given mapping between models satisfies some desirable properties.
The waywe expect to achieve it is through an extension of the CQC
Method which we successfully applied to database schema validation.
Iris
Miliaraki
National And Kapodistrian
University of Athens, Greece
Supervisor: Prof. Manolis Koubarakis
Microsoft Research Supervisor: Dr Ant Rowstron
Research title: Continuous RQL query processing on top of
distributed hash tables
Research summary: We
propose to study the efficient evaluation of continuous RQL queries
on top of distributed hash tables. RQL is a declarative query
language for RDF/RDFS databases with the ability to express both
data and schema queries in a uniform manner. It is used in two of
the most prominent RDF stores, Sesame and RSSDB. Distributed hash
tables are overlay networks that allow nodes holding data items to
self-organize and offer data lookup functionality in a provably
efficient, scalable, fault-tolerant and adaptive way. The problem of
efficient evaluation of continuous RQL queries on top of distributed
hash tables is essentially an open problem at the moment. We plan to
extend previous work done in our group in order to deal with
conjunctive queries in RQL. Then, we will study how to integrate
RDFS reasoning in our algorithms so that schema queries in RQL are
answered efficiently. Our next step will be to consider special
cases of conjunctive queries (e.g., path queries) that will probably
be amenable to more efficient query evaluation strategies. Finally,
we will consider nested RQL queries and multiple query optimization
issues. Our experimental work will be done by simulation or by using
large scale distributed infrastructures such as PlanetLab. We will
seek to demonstrate what trade-offs are possible between various
performance metrics that need to be optimized in this setting e.g.,
number of hops, network latency, network bandwidth and load
distribution under various data/query workloads.
Jorge
Salas López
Universidad Politécnica de Madrid, Spain
Supervisor: Prof. Ricardo Jiménez Peris
Research title: Large scale data streaming
Research summary: There
are many data streaming applications that will have to cope with
massive data streams in which large clusters will be required to be
able to cope with the input data stream. We foresee two classes of
applications: 1) applications with many continuous queries that can
scale by distributing the queries among a pool of sites; 2)
Applications with one or a few queries but very massive data that
will require parallelizing query operators among a large number of
sites to be able to cope with the massive input streaming data. In
order to realize a data streaming system able to satisfy the
requirements of these two classes of applications for very large
clusters (e.g. over 100 sites) it is a real challenge from a
scientific point of view. We believe that the advances in Storage
Area Networks (SANs) will enable a new kind of distributed support
for data streaming. On one hand, SANs provide high-bandwidth low
latency communication that enables end-to-end inter-host memory
access in the realm of microseconds. This can be exploited for fast
coordination among sites and agile load balancing. On the other
hand, these networks are characterized by having Network Interface
Cards (NICs) with their own processor with a reasonable capacity and
able to access the host memory through DMA. Our previous work
enables us to foresee that the envisioned highly scalable data
streaming platform can be realized.
Lynne
Hamill
University of Surrey,
United Kingdom
Supervisor: Prof. Nigel Gilbert
Microsoft Research Supervisor: Prof. Richard Harper
Research title: The relationship between personal communication
technologies and travel
Research summary:
Developments in communications technologies over the last few
decades are such that it would be expected that the demand for
travel would fall. Yet the demand for both communication and travel
has soared. The key question to be addressed in this research is: in
what circumstances do new communication technologies increase the
demand for travel and in what circumstances do they reduce it?
Communicating costs time and money, and the distribution of both
types of costs varies with the nature of the channel between sender
and receiver. Different channels provide different benefits but the
size of the communication network is probably the most important
(Metcalfe’s Law). Drawing on sociology, economics and social
psychology, historical and contemporary case studies will be used as
the basis for developing social simulation models using dynamically
interacting rule-based agents that are capable of accounting for the
observed macro relationships. Data on travel and the use of
communication technologies will be extracted from publicly available
nationally representative sources such as the Family Spending,
General Household and National Travel surveys. Limited original
qualitative data collection may be undertaken by for example using
diaries and interviews.
Marije Geldof
Royal Holloway, University of London, United Kingdom
Supervisor: Prof. Tim Unwin
Research title: The role of ICT in empowering people with
low-literacy levels in Africa
Research summary: The aim
of this research is to identify optimal ways in which Information
and Communication Technologies (ICTs) can contribute to empowering
the lives of people with low-literacy levels in Africa. The
particular contribution to this field will be through the research
at the interface between interaction design and literacy. The first
stage of the research has been a review of existing knowledge on the
concept of literacy and interaction design. The next step will be
field research in low-literate communities in Africa to identify the
needs of the people that can improve their daily lives. Focus groups
will be used to identify the necessary skills that are important in
the daily life of the community. These will provide the basis for a
‘virtual card’ set that will be used to help community members
categorize the skills and activities. This task will result in a
‘map’ representing the preferences and abilities of participants,
which will be the starting point for more in-depth interviews. A
meta-analysis of the results of the field research will subsequently
be used to develop a model on how to enhance and enable the
identified needs through the use of new technologies.
Martin
Rohrmeier
University of Cambridge, United Kingdom
Supervisor: Dr Ian Cross
Research title: Implicit learning of musical structure: experimental
and computational approaches
Research summary: The
proposed research aims to clarify our understanding of syntactic
aspects of music by means of an integrated programme of empirical
and computational studies. Investigation of the implicit learning of
musical systems should provide insight into how humans actually
acquire and apply knowledge about complex systems such as music and
language, which rules they grasp (or not), and how they represent
such knowledge. The research will focus primarily on melody and
harmony as central domains in systematic and cognitive musicology,
focusing initially on the formalisation of melodic structure. This
will involve statistical approaches and will incorporate experiments
on implicit learning of melodic structure as well as data-mining
oriented corpus analysis. Subsequently, modelling work on harmony
(in J. S. Bach’s chorales) that Mr Rohrmeier has already undertaken
will be extended so as to formalise stylistic harmonic structure by
means either of lexical-functional grammar or optimality theory.
Experiments on harmony learning and tonality induction will be
undertaken, and the results explored in the context of those of
large-scale corpus analysis. Results from these two domains are
expected to be applicable in automated musical style classification.
In combination with experiments on subjects’ identification of
relevant, information-carrying parameters, results may help improve
current approaches.
Michael Johnson
University of Limerick, Ireland
Supervisor: Dr Martin Hayes
Research title: New machine learning paradigms for robots operating in a dynamic team based environment
Research summary: This project will address the general problem of cooperative learning for a team of independent robots attempting to satisfy a common performance objective. The focus of the work will be on characterising the effect that a low power ad hoc network will have on the performance of a collective task. In particular the project will investigate how loss of sensor information or sensor malfunction impacts on robot performance in the pursuit of the objective(s) at hand. An example scenario would be robotic terrain mapping applications, security and/or biomed. In particular the project will ask how can energy consumption be optimised when certain (numerically well defined) doubts exist on the accuracy of the sensor feedback data.
Michael
Pedersen
University of Edinburgh, United Kingdom
Supervisor: Prof. Gordon Plotkin
Research title: High level languages for systems biology
Research summary:
Biologists use informal pictorial representations to express
metabolic, signalling and regulatory pathways. However
standardization is necessary for systematic model building and
Goryanin et al are co-developing a standard notation, SBGN, to that
end. However, the resulting diagrams are monolithic with no version
support. Computer science experience show that linguistic approaches
permit modular descriptions, and allow version support. Plotkin has
developed a ‘Calculus of Chemical Systems’ to naturally express
chemical reactions between (complexes of) modified proteins in
nested compartments. It is translatable to diagrams (Petri nets) and
has both qualitative and quantitative dynamics (stochastic and ODE).
The student will design a yet higher-level language with primitives
corresponding to those of SBGN, and expressiveness being
demonstrated via a natural modular description of the interferon
signalling pathways in macrophages. Translators between programs and
diagrams will be written, allowing both pictorial representation of
modular structure, and screen input of system components.
Compilation to Petri nets will permit both qualitative and
quantitative simulations. Biological specification languages may
also be investigated. Modal logics as previously used are too low
level and a more high level succinct language is needed natural for
biologists, compiled to a low level language, and so linked to a
standard model-checking package.
Michael
Smith
University of Edinburgh, United Kingdom
Supervisor: Dr Jane Hillston
Research title: Automated stochastic abstraction of programs
Research summary: We will
investigate techniques for automatically deriving a stochastic model
in PEPA from the code of a distributed system. The ultimate aim is
to use existing highly successful methods for performance analysis
of such models to analyze such systems directly, rather than only
analyzing independently-constructed models.
Michael
Verhoek
University of Oxford, United Kingdom
Supervisor: Prof. Alison Noble
Microsoft Research Supervisor: Prof. Andrew Blake
Research title: Graph cuts applied to 3D+time ultrasound image
spatio-temporal segmentation of the heart
Research summary: We
propose to investigate graph-cut based methods as the basis for
spatio-temporal segmentation of 3D+time ultrasound sequences of the
heart. This research will further the understanding of how graph-cut
methods can be applied to solve spatio-temporal problems as well as
potentially offer a new way to automatically derive quantitative 3D
information about the health of the heart in a robust and efficient
way. The latter would pave the way for cardiologists to make more
informed decisions about disease assessment and treatment monitoring
on the millions of people world-wide who have cardiac conditions.
Milan
Raicevic
University of Durham, United Kingdom
Supervisor: Prof. Carlos Frenk
Research title: The end of the cosmic dark ages
Research summary: We
propose a programme of supercomputer simulations of the evolution of
the early universe. These will target the end of the ‘cosmic dark
ages’ that followed the release of the heat of the primordial
fireball 400,000 years after the Big Bang. The dark ages lasted
about 500,000,000 years and were brought to an end by the formation
of the first stars and quasars. Their radiation ionized the
primordial gases making the universe transparent again to UV
radiation. Little is known about the process of reionization. We do
not know what exactly caused it, how long did it take or how the
subsequent formation of galaxies was affected by it. To understand
reionization it is essential to calculate how photons or particles
of light propagate in the primordial soup of hydrogen and helium.
Solving this radiative transfer problem can only be done by direct
computer simulation and even then the problem is hugely challenging.
We propose to develop novel techniques to tackle this problem and to
implement them in a cosmological simulation code that follows the
coupled evolution of dark matter and gas in the early universe. The
simulations will be carried out in the Cosmology Machine at Durham,
one of the largest supercomputers for academic research in Europe.
The goal is to quantify how the universe emerged from the dark ages
and to make predictions for observable properties that will be
tested with the new generation of ground- and space-based
observatories that are currently being built.
Sara
Vicente
University College London, United Kingdom
Supervisor: Dr Vladimir Kolmogorov
Microsoft Research Supervisor: Dr Carsten Rother
Research title: Optimisation techniques for computer vision
applications
Research summary: Markov
Random Field (MRF) model has become an indispensable tool in
computer vision and graphics. This model captures the fact that
images are spatially coherent. In the past most research has been
focused on simple MRF models, for example, Potts, quite likely due to the
lack of efficient MRF optimisation techniques for more complex
models. At the same time, complex MRFs are much more powerful; their
importance for vision has been demonstrated in more recent articles.
The goal of this PhD work is twofold: (i) develop novel and
efficient optimisation algorithms for complex MRF models; (ii)
extend MRF approaches to new and challenging computer vision and
graphics applications, such as object recognition and alpha matting.
It is important to note that these two tasks are typically coupled
since a new application often requires an objective function which
cannot be handled efficiently by existing methods.
Thomas
Wies
University of Freiburg, Germany
Supervisor: Prof. Andreas Podelski
Microsoft Research Supervisor: Dr Byron Cook
Research title: Symbolic data structure analysis
Research summary: The research topic is
symbolic data structure analysis. The major theme of this topic is
to combine program analysis techniques with symbolic reasoning (that
is, with theorem
proving) to analyze programs with unbounded data structures. Until
now, software model checkers are inherently limited when it comes to
the analysis of programs with heap-allocated data structures such as
lists, trees, and dynamically allocated arrays. Therefore existing
software model-checkers are not able to verify non-trivial
properties of such programs (for example, that the output of a list sorting
routine is a sorted list). Symbolic data structure analysis combines
existing data structure analysis techniques (such as shape analysis)
with symbolic reasoning (for example, with the theorem prover for tree
structures MONA). This symbolic approach has potential advantages
over existing (model-based) data structure analysis. First, symbolic
data structure analysis generalizes techniques such as predicate
abstraction which are used in software model checking. Therefore it
smoothly fits into this approach and makes it easier to integrate
this analysis into existing software model checkers. Second, it
allows combined reasoning over multiple data types, for example, numbers
and lists (needed for properties such as sorted-ness). In existing
approaches such reasoning is very limited and requires substantial
manual interaction. Next, the success of software model checking has
shown that symbolic approaches tend to scale better in practice than
non-symbolic approaches. In fact, the existing tools for data
structure analysis are only applicable to small programs. Finally,
this new approach has the potential of leading to the development of
counter example-guided abstraction refinement. This promises an
unmatched degree of automation with regard to existing approaches.
Umar
Mohammed
University College
London, United Kingdom
Supervisor: Dr Simon Prince
Microsoft Research Supervisor: Dr Andrew Fitzgibbon
Research title: Interactive video characters
Research summary: Creating
and animating realistic human characters for interactive computer
graphics is time-consuming and requires considerable expertise. A
typical approach would create a simplified mesh describing the
geometry and texture the surface with appropriate photographic
images. Each polygon is associated with one or more bones and these
are animated using motion capture data. The resulting characters can
be flexibly controlled in real time. However, despite this complex
procedure, they often look unrealistic and lack expression. In
contrast, video data of real people is easy to capture, and produces
very high quality footage. Unfortunately, there is no method for
flexibly animating video footage of real people, so video footage
cannot be used in interactive applications such as games. Our goal
is to develop methods to blend together short captured video
sequences of an actor to create a continuous video representation
suitable both for creating pre-rendered sequences and for
interactive applications such as games. In short, we aim to build a
patch-based generative model of the space-time video footage of a
single actor. We treat blending from between 2D/3D video-subsequences
as a constrained texture synthesis problem within this model. We aim
to produce a practical suite of tools for animators.
back to top
|