Machine Learning-Based Predictive Modelling of CRISPR/Cas9 guide efficiency
FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a set of tools for performing genome-wide association studies (GWAS) on large data sets. FaST-LMM runs on both Windows and Linux, and contains code to do (1) univariate GWAS, (2) testing sets of SNPs, (3) feature selection for background correction, (4) epistatic association scans, (5) a correction method for cellular heterogeneity in methylation and similar data.
Here are some of the events I've been involved in since joining the group!
OSLO is a .NET and Silverlight class library for the numerical solution of ordinary differential equations (ODEs). The library enables numerical integration to be performed in C#, F# and Silverlight applications. OSLO implements Runge-Kutta and back differentiation formulae (BDF) for non-stiff and stiff initial value problems.
Project focussed on quantifying the economic and environmental, trade-offs and synergies that arise under land use change.
Climatology gives you climate information for anywhere on Earth: temperature, rain and sunniness. Whether finding where are the warm, dry places to go on holiday in December, or avoiding rain for your wedding, to finding out what the climate is like in Kazakhstan in April, Climatology allows you to discover the information you want.
Microsoft research developed a tool in collaboration with IUCN, SSC and UAL, to allow the rapid mapping and assessment of species, threats to species and conservation interventions. it is hoped that this tool will improve bottom up conservation monitoring.
A Bright Minds Internship is your opportunity to work on real-world projects alongside some of the brightest minds in computer science at Microsoft Research Cambridge.
Use consumer video equipment to trace animal movement.
Pluripotency is the unique characteristic of embryonic stem (ES) cells, which demonstrate the capacity to generate all somatic cell lineages. But how ES cells decide to transition to a given adult cell type remains unknown. In this project, we combine formal verification, model-checking and model synthesis into a new tool for uncovering the transcriptional program of pluripotency: a reasoning engine for interaction networks.
Face In The Crowd examines the social impact of crowdsourcing platforms—cloud-based computational systems that allow the outsourcing of work through open requests—and how they might shape the future of work.
Project focussed on the patterns of road development in tropical forests.
Tempe is a web service for exploratory data analysis.
Distribution Modeller (temporary name only!) is CEES' end-to-end tool that lets the researcher to rapidly import data, supplement that data with environmental info from FetchClimate, specify an arbitrary model by point and click or in code, parameterize the model against the data using Filzbach, make and visualize predictions with a full propagation of parameter uncertainty – then package and share everytihng, in a way that is inspectable, repeatable, and modifiable.
Identifying and Visualizing Viral Content
Labs: New York
SNAP is a new sequence aligner that is 10-100x faster and simultaneously more accurate than existing tools like BWA, Bowtie2 and SOAP2. It runs on commodity x86 processors, and supports a rich error model that lets it cheaply match reads with more differences from the reference than other tools. SNAP was developed by a team from the UC Berkeley AMP Lab, Microsoft, and UCSF. Binaries are available at http://github.com/downloads/amplab/snap/
The University College London and University of Oxford have recently received funding from the EPSRC Cross-Disciplinary Interfaces Programme (2020 Science: Mathematical and Computational Modelling of Complex Natural Systems) to collaborate with Microsoft Research Cambridge on a programme of research that will involve up to 17 post-doctoral Research Associates over a five year period.
We develop and accelerate better, predictive, conservation science, tools and technologies in areas of societal importance. We aim to provide scientific support for effective environmental solutions for key decision makers, from the boardroom to governments makers. We are committed to leveraging the unique position our group occupies to influence how individuals and nations approach and tackle issues such as natural resource scarcity and biodiversity loss.
These are two simple formulae to wrap latitude and longitude back to their proper ranges.
While Amazon has already made accessible (via S3) the genomes in the 1000 genome project, there is no accompanying abstraction to pick whatever portion of the vast data (250 Gbytes per sequence) that a biologist or doctor wishes interactively across the network. We would like to do something similar in a storage platform such as Azure, but where access can be done by what we call a Genome Query Language (developed with folks at UCSD).
By combining large data sets, with cutting edge computational methods (often of our own devising), with new technology for sensing the natural world, we aim to address these fundamental questions: How is biodiversity distributed across the Earth's surface? Where are the most pressing threats to biodiversity, in which places, and over what timescales? And what we can we do to minimize those threats?
A Systems and Software Perspective
Environmental Informatics Framework (EIF) is a strategy for using cutting-edge Microsoft technologies to advance environmental data discoverability, accessibility, and consumability.
A cloud-based user experience, Microsoft Layerscape makes it easy for the Earth-sciences community to visualize and analyze large, complex datasets to facilitate the discovery of new environmental insights into Earth. By using powerful, everyday tools like Microsoft Excel, Layerscape enables users to explore new ways of looking at Earth and oceanic data, and build predictive modeling in areas such as climate change, health epidemics, and oceanic shifts.
In recent years, computational challenges have become more and more important to infer biologically relevant information from the vast amount of experimental data available to systems biologists.