|
This call was motivated by the
Towards 2020
Science report, which outlines the convergence of computer science and
the natural sciences towards 2020. The implications of the report are further
discussed in a special issue of Nature inspired by the report [Nature 440
(7083); 23 March 2006].
A crucial observation of the Towards 2020 Science report is that
concepts, theorems and tools developed within computer science are now being
developed into new conceptual tools and technological tools of potentially
profound importance, with wide-ranging applications outside the subject in which
they originated, especially in sciences investigating complex systems, most
notably in biology, chemistry and earth sciences.
Microsoft Research selected 3 proposals aimed at creating prototypes of new
kinds of conceptual and technological tools – tools that are motivated by a
specific need in scientific research and with the potential to significantly
advance science. Priority was given to investigations into two broad global
challenges:
- Earth’s Life Support System including both abiotic subsystems (e.g.,
geophysics and climate research) and biotic subsystems (e.g., biodiversity
and ecosystems), and their interaction.
- Biology in a broad sense, including synthetic, systems and organismic
biology.
From a computational perspective the following methods of handling complexity
and integrating theory, experiment and models were of major importance:
- Statistical approaches, in particular machine learning, to predict the
properties and behaviour of a system when the complexity of the domain, or
the absence of sufficiently precise models at the appropriate level of
description, prohibit a first-principles simulation with any current or
conceivable future level of computational resource. Also, the development of
machine learning tools suitable for the informed analysis of vast and/or
heterogeneous datasets to assist in hypothesis generation.
- Codification of scientific knowledge about complex systems, i.e. turning
knowledge into a coded representation, in terms of data or programs, that is
mechanically executable and analysable, or made suitable for efficient and
informed modelling.
Award Recipients
Development and application of a computerized and automated
method for cell lineage analysis
Ehud Shapiro, Weizmann Institute of Science, Israel
A multi-cellular organism develops from a single cell – the zygote,
through numerous cell divisions and cell deaths to display an
astonishing complexity of trillions of cells of different types,
residing in different tissues and expressing different genes. This
complex developmental program is still mostly unknown. The question of
lineage relations between the cells of an organism is of interest not
only in the field of developmental Biology but also in fields such as
cancer research and stem-cell research, where processes of tissue
maintenance and tumour formation are still largely unknown. Extant
methods to reconstruct lineage relations are very limited and usually
invasive. Our group developed a method for reconstructing the lineage
relations among cells of a multicellular organism. The method is based
on the fact that somatic mutations accumulated during normal development
of a higher organism contain information that enable to reconstruct the
organism cell lineage tree. The systems takes as input DNA samples,
processes them using a liquid handling laboratory robot, analyzes the
signals obtained from a capillary electrophoresis machine to detect
mutations, processes the mutations using a phylogenetic algorithm, and
outputs a tree representing the lineage relations between the samples
from which DNA was obtained. We have proved this system in an initial
study that analyzed artificial ex-vivo cell trees. We are now embarking
on a large scale collaborative study - the 'Mouse Cell Lineage Project',
whose aim will be to reconstruct the entire mouse cell lineage tree.
This task poses unique and challenging algorithmic and theoretical
problems of making statistical inferences on cellular populations,
inferring the relations between lineage and tissue locations, geometric
location and gene expression, detecting lineage boundaries and devising
novel strategies for phylogenetic reconstructions. While as the lineage
challenge resembles problems that have been addressed in the fields of
population genetics and species phylogenetics, it has unique
characteristics that necessitate novel mathematical models and
algorithms. Thus whereas mathematical and algorithmic work has reached
profound advances in the fields of inter-species and intra-species
phylogenetics, the aim of this work will be to lay the foundation of a
new scientific field – cellular phylogenetics. The fruits of this
collaborative biological and computational project will be an
understanding of developmental programs and of tissue maintenance
processes, both normal and following disease, in complex multi-cellular
organisms. This will have a profound biological and medical impact.
Analysis of animal ecological and social networks with programmable sensor nodes
Klaus Wehrle, RWTH Aachen University, Germany Natural behaviour of
animals takes place in complex environments, allowing for a wealth of social and
ecological interactions. While laboratory studies have been extremely useful to
identify individual mechanisms of behaviour, the functioning of such behaviour
in natural environments is still only poorly understood. Efficient means of
animal monitoring in the wild as well as tools for modelling complex systems are
required for a deeper understanding of phenomena such as spatial cognition,
optimal foraging, social behaviour and learning, or multi-species interactions.
Current telemetric approaches to animal monitoring are often limited by the
range and bandwidth of radio-transmission, especially in large, subterranean, or
under-water environments. In this project, we will develop a novel system for
animal surveillance in the wild, using tiny sensor node technology. Programmable
sensor nodes with a multitude of sensing capabilities attached to the animals
will record data such as motion, vocalizations, and body temperature of the
carrier. Upon encounter of another animal, sensor nodes interact, exchange and
aggregate data on the time and participants of the meeting. Stationary base
nodes at occasionally visited, but easily accessible locations will be used to
collect the animal data for further analysis, including trajectory
reconstruction, daily activity profiles, and interaction graphs. The project
brings together experience in sensor network technology (both hard- and software
development) and animal experimentation (behavioural experiments in virtual
reality, transponder systems for outdoor monitoring). Initial experiments will
deal with laboratory rats exploring burrow-like tube systems, while further work
will extend to subterranean and outdoor settings. The final monitoring system
will be instrumental in elucidating social and ecological interactions in
hard-to-observe animals including subterranean or marine mammals and
cave-dwellers such as bats and flying foxes. Sensor networks will be employed to
acquire data which are out of reach for conventional methods.
Bayesian System Identification for Biological Pathway Modelling
Mark Girolami, University of Glasgow, United Kingdom
Biological pathway models form the basis of what is referred to as Systems
Biology where biological systems are modelled mathematically and their
behaviours, under varying simulated conditions, are studied in assessing the
validity of a particular hypothesis regarding the nature of the system. Much
hope is placed on the synergistic advances which will be made by the interplay
between abstracted modelling and experimental investigation within a systems
biology context. Biochemical pathway models are highly parameterised systems of
deterministic ordinary differential equations (ODE) which will exhibit a
potentially diverse set of behaviours dependent upon the values of these
parameters. The values of parameters such as kinetic rate constants are largely
unknown or at best have been ill-defined experimentally. The current widely
accepted methodology for system identification are simple non-probabilistic
parameter fitting methods such as least-squares which fail to fully identify
such models and characterise the uncertainty in the experimental data, the model
topology, parameter values and subsequent predictions made by the model. The
adoption of a Bayesian viewpoint in developing pathway models allows full
characterisation and analysis of all model uncertainty and provides a consistent
way in which to reason about models and subsequent experimental design. This
project proposes the development of computational tools for biochemical pathway
modelling which will employ full Bayesian inference for system identification
and which will overcome the major weaknesses associated with the
non-probabilistic methodologies current in systems biology. |