An occasional forum for delivering academic computational biology talks. All talks are open to the public.
Title: Modeling molecular heterogeneity between individuals and single cells
Speaker: Oliver Stegle
Affiliation: European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI)
Host: Jennifer Listgarten and Nicolo Fusi
Date: Monday, May 11th, 2015
Time: 2:00 PM - 3:00 PM
The analysis of large-scale expression datasets is often compromised by hidden structure between samples. In the context of genetic association studies, this structure can be linked to differences between individuals, which can reflect their genetic makeup (such as population structure) or be traced back to environmental and technical factors. In this talk, I will discuss statistical methods to reconstruct this structure from the observed data to account for it in genetic analyses. By incorporating principles from causal reasoning, we show that critical pitfalls of falsely explaining away true biological signals can be circumvented. In the second part of this talk I will extend the introduced class of latent variable models to account for unwanted heterogeneity in single-cell transcriptome datasets. In applications to a T helper cell differentiation study, we show how this model allows for dissecting expression patterns of individual genes and reveals new substructure between cells that is linked to cell differentiation. I will finish with an outlook of modeling challenges and initial solutions that enable combining multiple omics layers that are profiled in the same set of single cells.
BiographyOliver Stegle is a group leader at the European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) in Cambridge, UK. His group develops statistical methods to analyse high-dimensional molecular traits both in the context of genetic association and single-cell biology. He received his Ph.D. from the University of Cambridge, UK, in physics in 2009, working with David MacKay. After a period as a postdoctoral researcher at the Max Planck Campus in Tübingen, Germany, he moved to the EMBL-EBI in November 2012 to establish his own research group.
Title: Mapping single cells: A geometric approach
(This will be part of our MSR New England General Colloquium Series, intended for broad audiences of all backgrounds.)
Speaker: Dana Pe'er
Affiliation: Departments of Biological Sciences and Systems Biology, Columbia University
Host: Jennifer Listgarten
Date: Wed. Nov 5th, 2014
Time: 4:00 PM - 5:00 PM
High dimensional single cell technologies are on the rise, rapidly increasing in accuracy and throughput. These offer computational biology both a challenge and an opportunity. One of the big challenges with this data-type is to understand regions of density in this multi-dimensional space, given millions of noisy measurements. Underlying many of our approaches is mapping this high-dimensional geometry into a nearest neighbor graph and characterization single cell behavior using this graph structure. We will discuss a number of approaches (1) An algorithm that harnesses the nearest neighbor graph to order cells according to their developmental maturity and its use to identify novel progenitor B-cell sub-populations. (2) Using reweighted density estimation to characterize cellular signal processing in T-cell activation. (2) New clustering and dimensionality reduction approaches to map heterogeneity between cells; with an application to characterizing tumor heterogeneity in Acute Myeloid Leukemia.
Dana Pe’er is an associate professor in the Departments of Biological Sciences and Systems Biology. Her team develops computational methods that integrate diverse high-throughput data to provide a holistic, systems-level view of molecular networks. Currently they have two key focuses: developing computational methods to interpret single cell data and understand cellular heterogeneity; modeling how genetic and epigenetic variation alters regulatory network function and subsequently phenotype in health and disease. This path has led them to explore how systems biology approaches can be used to personalize cancer care. Dana is recipient of the Burroughs Wellcome Fund Career Award, NIH Directors New Innovator Award, NSF CAREER award, Stand Up To Cancer Innovative Research Grant, a Packard Fellow in Science and Engineering, and very recently, the prestigious 2014 ISCB Overton Prize Award.
Title: Reconstructing tumour subpopulation genotypes and evolution from short-read sequencing of bulk tumour samples
Speaker: Quaid Morris
Affiliation: Donnelly Center for Cellular and Biomolecular Research, University of Toronto
Host: Jennifer Listgarten
Date: Friday, September 12th, 2014
Time: 2:00 PM - 3:30 PM
Tumours consist of genetically diverse subpopulations of cells that differ in their response to therapy and their metastatic potential. The short read sequencing used to characterize tumour heterogeneity only provides the allelic frequencies of the tumour somatic mutations, not full genotypes of individual cells. I will describe my lab’s efforts to recover these full genotypes by fitting subpopulation phylogenies to the allele frequency data. In some circumstances, a full, unique reconstruction is possible but often multiple phylogenies are consistent with the data. Our methods (PhyloSub, PhyloWGS, treeCRP) use Bayesian inference to distinguish ambiguous and unambiguous portions of the phylogeny thereby explicitly representing reconstruction uncertainty. Our methods incorporate simple somatic mutations (point mutations and indels) as well as copy number variations; have excellent results on real and simulated data; and can take as input allele frequencies from single or multiple tumour samples where these frequencies are estimated using either targeted or whole genome sequencing.
Quaid Morris is an associate professor in the Donnelly Centre at the University of Toronto in Canada. He is a multi-disciplinary researcher with cross-appointments in the Departments of Computer Science, Engineering, and Molecular Genetics. He founded his lab in 2005 and after having received his PhD from the Massachusetts Institute of Technology (MIT) in 2003. His doctoral training was in machine learning and computational neuroscience under the supervision of Peter Dayan at M.I.T. and the Gatsby Unit at University College London. His lab uses statistical learning to make biological discoveries and develop new methodology for analysing large-scale biomedical datasets. He is currently interested in understanding cancer (and other complex diseases) using genomics; post-transcriptional regulation; text mining of medical records; and the automated prediction of gene function (see http://www.genemania.org).
Title: The Warped Linear Mixed Model: finding optimal phenotype transformations yields a substantial increase in signal in genetic analyses
Speaker: Nicolo Fusi
Affiliation: Microsoft Research, Los Angeles
Host: Jennifer Listgarten
Date: Wed. August 20th, 2014
Time: 2:00 PM - 3:30 PM
Genome-wide association studies, now routine, still have many remaining methodological open problems. Among the most successful models for GWAS are linear mixed models, also used in several other key areas of genetics, such as phenotype prediction and estimation of heritability. However, one of the fundamental assumptions of these models—that the data have a particular distribution (i.e., the noise is Gaussian-distributed)—rarely holds in practice. As a result, standard approaches yield sub-optimal performance, resulting in significant losses in power for GWAS, increased bias in heritability estimation, and reduced accuracy for phenotype predictions. In this talk, I will discuss our solution to this important problem—a novel, robust and statistically principled method, the “Warped Linear Mixed Model”—which automatically learns an optimal “warping function” for the phenotype simultaneously as it models the data. Our approach effectively searches through an infinite set of transformations, using the principles of statistical inference to determine an optimal one. In extensive experiments, we find up to twofold increases in GWAS power, significantly reduced bias in heritability estimation and significantly increased accuracy in phenotype prediction, as compared to the standard LMM.
Microsoft Research New England
First Floor Conference Center
One Memorial Drive, Cambridge, MA
(directions can be found here)
Upon arrival, be prepared to show a picture ID and sign the Building Visitor Log when approaching the Lobby Floor Security Desk. Alert them to the name of the event you are attending and ask them to direct you to the appropriate floor. Typically the talks are located in the First Floor Conference Center, however sometimes the location may change.
Guests are allowed to park in our garage located at One Memorial Drive. Microsoft receptionists will not validate parking for any guests. All day parking is $27.00 on weekdays and $10.00 on weekends. Please note that these rates are subject to change.
*Hospitality Notice: Microsoft Research may provide hospitality at this event. Because different universities and legal jurisdictions have differing rules, we rely on you to know whether acceptance of this invitation would be inconsistent with those rules. Accordingly, By accepting our invitation, you confirm that this invitation is compliant with your institution's policies.
To subscribe to talk announcements for this series, send a message to firstname.lastname@example.org and enter subscribe msrne-cb-announce in the body of the message. If you have any questions or concerns please send us an email.