Computational Biology Seminar Series

An occasional forum for delivering academic computational biology talks. All talks are open to the public.

Upcoming Speakers and Events

Title: TBD

(This will be part of our MSR New England General Colloquium Series, intended for broad audiences of all backgrounds.)

Speaker: Barbara Engelhardt

Affiliation: Princeton

Host: Jennifer Listgarten

Date: Wed. November 4th, 2015

Time: 4pm - 5pm with reception to follow




Barbara completed her BS and MS from Stanford University in 1999, and her PhD from University of California, Berkeley in 2007, advised by Michael Jordan. She did a postdoc at University of Chicago working with Matthew Stephens from 2008-2012, and was Assistant Professor at Duke University from 2012-2014. She also spent two years working at Jet Propulsion Laboratory (1999-2001), a summer at Google Research (2005), and a year at 23andMe (2007-2008).


Title: Computational Aspects of Biological Information 2015

Computational Aspects of Biological Information (CABI) 2015 is the third one-day workshop on challenges and successes in computational biology and will bring together experts in the Boston/Cambridge area to discuss computational solutions to problems in biology, including systems biology, genomics, and related areas.

The workshop is open to everyone and registration is free. Continental breakfast and lunch will be served.

Tuesday, December 1, 2015
Microsoft Research New England
Horace Mann Conference Room
First Floor Conference Center
One Memorial Drive, Cambridge, Mass.

Registration and breakfast begin at 9:00 a.m.


Registration is now open

Poster session

There will be a poster session in the afternoon. To submit a poster, please review the guidelines and submission information by November 1st. Space is limited, and accepted poster notifications will be sent by November 10th

Workshop speakers

Confirmed speakers include:

  • Bonnie Berger (MIT CSAIL)
  • Arup Chakraporty (MIT Chemistry)
  • Michael Desai (Harvard Systems Biology)
  • Polina Golland (MIT CSAIL)
  • Rafael Irizarry (Harvard University)
  • Leonid Mirny (Harvard-MIT Division of Health Sciences and Technology, MIT)
  • Peter Park (Harvard Medical School)
  • David Sontag (NYU)
  • Shamil Sunyaev (Harvard Medical School)

Organizing committee

Nicolo Fusi (Microsoft Research)
Jennifer Listgarten (Microsoft Research)
James Zou (Microsoft Research)

For the most up-to-date information, please go to the CABI web site at:

Past Speakers

Title: Personalized Health with Gaussian Processes

(This will be part of our MSR New England General Colloquium Series, intended for broad audiences of all backgrounds.)

Speaker: Neil Lawrence

Affiliation: University of Sheffield

Host: Nicolo Fusi

Date: Wed, Aug 19

Time: 4pm - 5pm with reception to follow


Modern data connectivity gives us different views of the patient which need to be unified for truly personalized health care. I'll give a personal perspective on the type of methodological and social challenges we expect to arise in this this domain and motivate Gaussian process models as one approach to dealing with the explosion of data.


Neil Lawrence received his bachelor's degree in Mechanical Engineering from the University of Southampton in 1994. Following a period as an field engineer on oil rigs in the North Sea he returned to academia to complete his PhD in 2000 at the Computer Lab in Cambridge University. He spent a year at Microsoft Research in Cambridge before leaving to take up a Lectureship at the University of Sheffield, where he was subsequently appointed Senior Lecturer in 2005. In January 2007 he took up a post as a Senior Research Fellow at the School of Computer Science in the University of Manchester where he worked in the Machine Learning and Optimisation research group. In August 2010 he returned to Sheffield to take up a collaborative Chair in Neuroscience and Computer Science.

Neil's main research interest is machine learning through probabilistic models. He focuses on both the algorithmic side of these models and their application. He has a particular focus on applications in personalized health and computational biology, but happily dabbles in other areas such as speech, vision and graphics.

Neil was Associate Editor in Chief for IEEE Transactions on Pattern Analysis and Machine Intelligence (from 2011-2013) and is an Action Editor for the Journal of Machine Learning Research. He was the founding editor of the JMLR Workshop and Conference Proceedings (2006) and is currently series editor. He was an area chair for the NIPS conference in 2005, 2006, 2012 and 2013, Workshops Chair in 2010 and Tutorials Chair in 2013. He was General Chair of AISTATS in 2010 and AISTATS Programme Chair in 2012. He was Program Chair of NIPS in 2014 and is General Chair for 2015.


Title: Modeling molecular heterogeneity between individuals and single cells

Speaker: Oliver Stegle

Affiliation: European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI)

Host: Jennifer Listgarten and Nicolo Fusi

Date: Monday, May 11th, 2015

Time: 2:00 PM - 3:00 PM


The analysis of large-scale expression datasets is often compromised by hidden structure between samples. In the context of genetic association studies, this structure can be linked to differences between individuals, which can reflect their genetic makeup (such as population structure) or be traced back to environmental and technical factors. In this talk, I will discuss statistical methods to reconstruct this structure from the observed data to account for it in genetic analyses. By incorporating principles from causal reasoning, we show that critical pitfalls of falsely explaining away true biological signals can be circumvented. In the second part of this talk I will extend the introduced class of latent variable models to account for unwanted heterogeneity in single-cell transcriptome datasets. In applications to a T helper cell differentiation study, we show how this model allows for dissecting expression patterns of individual genes and reveals new substructure between cells that is linked to cell differentiation. I will finish with an outlook of modeling challenges and initial solutions that enable combining multiple omics layers that are profiled in the same set of single cells.


Oliver Stegle is a group leader at the European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) in Cambridge, UK. His group develops statistical methods to analyse high-dimensional molecular traits both in the context of genetic association and single-cell biology. He received his Ph.D. from the University of Cambridge, UK, in physics in 2009, working with David MacKay. After a period as a postdoctoral researcher at the Max Planck Campus in Tübingen, Germany, he moved to the EMBL-EBI in November 2012 to establish his own research group.



TitleMapping single cells: A geometric approach

(This will be part of our MSR New England General Colloquium Series, intended for broad audiences of all backgrounds.)

Speaker: Dana Pe'er

Affiliation: Departments of Biological Sciences and Systems Biology, Columbia University

Host: Jennifer Listgarten

Date: Wed. Nov 5th, 2014

Time: 4:00 PM - 5:00 PM  


High dimensional single cell technologies are on the rise, rapidly increasing in accuracy and throughput. These offer computational biology both a challenge and an opportunity. One of the big challenges with this data-type is to understand regions of density in this multi-dimensional space, given millions of noisy measurements. Underlying many of our approaches is mapping this high-dimensional geometry into a nearest neighbor graph and characterization single cell behavior using this graph structure. We will discuss a number of approaches (1) An algorithm that harnesses the nearest neighbor graph to order cells according to their developmental maturity and its use to identify novel progenitor B-cell sub-populations. (2) Using reweighted density estimation to characterize cellular signal processing in T-cell activation. (2) New clustering and dimensionality reduction approaches to map heterogeneity between cells; with an application to characterizing tumor heterogeneity in Acute Myeloid Leukemia.


Dana Pe’er is an associate professor in the Departments of Biological Sciences and Systems Biology. Her team develops computational methods that integrate diverse high-throughput data to provide a holistic, systems-level view of molecular networks. Currently they have two key focuses: developing computational methods to interpret single cell data and understand cellular heterogeneity; modeling how genetic and epigenetic variation alters regulatory network function and subsequently phenotype in health and disease. This path has led them to explore how systems biology approaches can be used to personalize cancer care. Dana is recipient of the Burroughs Wellcome Fund Career Award, NIH Directors New Innovator Award, NSF CAREER award, Stand Up To Cancer Innovative Research Grant, a Packard Fellow in Science and Engineering, and very recently, the prestigious 2014 ISCB Overton Prize Award.










Title: Reconstructing tumour subpopulation genotypes and evolution from short-read sequencing of bulk tumour samples

Speaker: Quaid Morris

Affiliation: Donnelly Center for Cellular and Biomolecular Research, University of Toronto

Host: Jennifer Listgarten

Date: Friday, September 12th, 2014

Time: 2:00 PM - 3:30 PM  


Tumours consist of genetically diverse subpopulations of cells that differ in their response to therapy and their metastatic potential. The short read sequencing used to characterize tumour heterogeneity only provides the allelic frequencies of the tumour somatic mutations, not full genotypes of individual cells. I will describe my lab’s efforts to recover these full genotypes by fitting subpopulation phylogenies to the allele frequency data. In some circumstances, a full, unique reconstruction is possible but often multiple phylogenies are consistent with the data. Our methods (PhyloSub, PhyloWGS, treeCRP) use Bayesian inference to distinguish ambiguous and unambiguous portions of the phylogeny thereby explicitly representing reconstruction uncertainty. Our methods incorporate simple somatic mutations (point mutations and indels) as well as copy number variations; have excellent results on real and simulated data; and can take as input allele frequencies from single or multiple tumour samples where these frequencies are estimated using either targeted or whole genome sequencing.


Quaid Morris is an associate professor in the Donnelly Centre at the University of Toronto in Canada. He is a multi-disciplinary researcher with cross-appointments in the Departments of Computer Science, Engineering, and Molecular Genetics. He founded his lab in 2005 and after having received his PhD from the Massachusetts Institute of Technology (MIT) in 2003. His doctoral training was in machine learning and computational neuroscience under the supervision of Peter Dayan at M.I.T. and the Gatsby Unit at University College London. His lab uses statistical learning to make biological discoveries and develop new methodology for analysing large-scale biomedical datasets. He is currently interested in understanding cancer (and other complex diseases) using genomics; post-transcriptional regulation; text mining of medical records; and the automated prediction of gene function (see


Title: The Warped Linear Mixed Model: finding optimal phenotype transformations yields a substantial increase in signal in genetic analyses

Speaker: Nicolo Fusi

Affiliation: Microsoft Research, Los Angeles

Host: Jennifer Listgarten

Date: Wed. August 20th, 2014

Time: 2:00 PM - 3:30 PM  


Genome-wide association studies, now routine, still have many remaining methodological open problems. Among the most successful models for GWAS are linear mixed models, also used in several other key areas of genetics, such as phenotype prediction and estimation of heritability. However, one of the fundamental assumptions of these models—that the data have a particular distribution (i.e., the noise is Gaussian-distributed)—rarely holds in practice. As a result, standard approaches yield sub-optimal performance, resulting in significant losses in power for GWAS, increased bias in heritability estimation, and reduced accuracy for phenotype predictions. In this talk, I will discuss our solution to this important problem—a novel, robust and statistically principled method, the “Warped Linear Mixed Model”—which automatically learns an optimal “warping function” for the phenotype simultaneously as it models the data. Our approach effectively searches through an infinite set of transformations, using the principles of statistical inference to determine an optimal one. In extensive experiments, we find up to twofold increases in GWAS power, significantly reduced bias in heritability estimation and significantly increased accuracy in phenotype prediction, as compared to the standard LMM.




