Jonathan M. Carlson
Jonathan M. Carlson, Ph.D.
One Microsoft Way
Redmond, WA 98052
United States of America
I am a Researcher in the eScience Research group at Microsoft Research. I am interested in the development and application of statistical and machine learning methods to biological data, working in close collaboration with biological and medical researchers. I am especially interested in applications in immunology, virology and genetics, as well as the development of more general techniques that transcend specific disciplines. My current primary focus is to develop statistical models that help us understand HIV's adaptive response to the immune system. Although the ability of HIV to mutate and adapt is one of the primary challenges for vaccine and drug design, it also poses an interesting opportunity for basic research: because HIV rapidly adapts changing environments, we can study HIV adaptation to learn about the workings of the virus and the immune system, furthering our general understanding of immunology and virology while working toward finding exploitable weaknesses to leverage in a vaccine.
I joined the Escience group at Microsoft as a researcher in 2008, and have been focusing on applied machine learning and statistics for HIV ever since. My approach is fundamentally interdisciplinary, as I seek to use computational tools to gain real biological insight and always work in close collaboration with biologists. My models of viral escape have achieved broad recognition in the HIV community, where they have led to the discovery of novel viral-host interactions, insights into mechanisms of natural immune control, and the identification of vaccine candidates that are moving toward clinical trials. I have co-authored over 60 papers in the field with dozens of labs, including publications in Science and Nature, and have served on advisory panels and committees for the Institutes of Medicine, the Gates Foundation and the Center for HIV/AIDS Vaccine Immunology. I received my BA in 2003 from Dartmouth, where I was awarded the top senior thesis prizes in both biology and computer science, and my Ph.D in computer science in 2009 from the University of Washington, where I was awarded the university's distinguished dissertation award and was a finalist for the US Council of Graduate School's dissertation award.
My take on some of my favorite papers I've been involved with. Click on a title to get more info and links to resources, collaborators, papers, etc.
Here's a provocative thought: as we roll out drugs to the sickest people first, are we selecting for weaker viruses--ie, those that don't make people sick, and thus are less likely to be subjected to drug therapy? We don't have direct evidence for this, but when we compare Botswana to South Africa, we see high CD4 (healthier immune systems) per level of viral load (viral concentration) or viral replicative capacity (how well it grows in a lab). Perhaps related (perhaps not), we also see an increased burden of circulating HLA escape mutations. At the very least, this increased burden appears to have wiped out B*57's ability to modulate relative viral control. Might it also have weakened the virus? Read more...
This is a great paper that provides a great rationale for vaccine design: (1) it's critical to target specific epitopes; (2) those epitopes need to be those where mutation comes at a cost; and (3) protein structure is a great way to predict which epitopes will be costly. We did this in collaboration with Florencia Pereyra and Bruce Walker at the Ragon Institute. The idea was to test a bunch of nature controllers and normal non-controllers to see which epitopes they target, whether that explains control, and what characterizes good epitopes. Read more...
HIV is characterized by a tremendous rate of mutation, that leads to a high level of genetic diversity within and among patients. Yet transmission is frequently (~90%) established by a single genetic varient. What (if anything) is so special about that "founder" virus? In short, fitter viruses are more likely to be transmitted. This has two major implications: (1) there are likely many nonproductive infection events that happen at the site of exposure (otherwise, where is the substrate for competition?); (2) As you raise the bar for infect (that is, make it less likely you'll be infected), you increase the risk that breakthrough infection will cause more severe disease. Read more... Or watch the presentation...
In collaboration with Mary Carrington's group, we showed in Science that the quantity of HLA-C surface protein correlates with HIV disease progression, the probability than HLA-C epitopes will be targeted by the immune system, and the probability that HIV will escape within those epitopes. Mary further showed that HLA-C expression levels are linked to some auto-immune diseases. This is an important study that highlights the the role of HLA-C (which is generally ignored in the field) and demonstrates how our models of selection can be used to generate and test hypotheses. As always, this was a hugely collaborative effort, making key use of data from Philip Goulder, Zabrina Brumme and many others. The MSR Connections team wrote a blog post providing a nice high-level description.
As part of the IHAC collaboration, we published the largest HIV escape study ever done today. We studied the full HIV proteomes from 1,888 chronically HIV clade B-infected individuals who had never been given drugs to identify HLA escape mutations. This will be a useful resource to the community, and also showed some important new insights into HIV escape. For example, we now know that escape typically happen at anchor residues and that a hallmark of protective HLA alleles it the ability to drive escape across the proteome, especially at anchors. Read more...
This paper marks the development and introduction of our phylogenetically corrected logistic regression algorithm. This allows us to do all the standard logistic regression analyses--test for differential effects or measure effect size--as logistic regression, but do it while correcting for phylogenetic structure. You can use the tool yourself here, though we have to limit to single analyses. If you'd like an executable version of the code, email me. We're working on a better, scalable solution, so stay tuned.
We used this approach to look at an interesting phenomenon: although we like to group HLA alleles by the their tendency to bind similar epitopes, we find that, in vivo, the escapes that evolution selects for differ by HLA. For example, when B*57:03 and B*57:02 (two very similar HLA alleles) present the same epitope, the observed escape mutations are usually different. Very surprising indeed, as it forces us to think more carefully about how (and if) we group alleles, as well as what the role of differential escape is. This work was in collaboration with Philip Goulder, John Frater, Roger Shapiro and Thumbi Ndung'u and thier labs.
Teaming up with Galit Alter and Marcus Altfeld from the Ragon Institute, we showed that HIV is adapting to the NK-cell-mediated immune response. We used PhyloD to identify HIV polymorphisms that are enriched among patients who express certain KIR genes. These associations imply that HIV is adapting to something specific in these individuals. In fact, NK-cells are activated and inhibited by their KIR proteins, which bind to HLA-epitope complexes. It looks like HIV mutates to manipulate these interactions, effectively shutting down the Natural Killer cells. What a great example of how we can start from adaptation, then work backward to figure out what's going on!
From time to time, the protein traslational machinery gets messed up, slipping a bit so that translation happens out of frame. The result is of course disfunctional protein fragments, which quickly get chewed up by the proteosome. We wondered if some of these were be presented as epitopes. If so, then HIV would of course escape (it always does!), and we would thus be able to find them by looking for HLA-mediated escape in non-primary reading frames. In back to back papers in the Journal of Experimental Medicine, we showed that this is indeed what's happening. In collaboration with Christian Brander's group, the first paper did a deep dive into one epitope, showing that the "cryptic epitope" was expressed was by translation at an alternative start size that would normally encode a lysine. In an independent paper with Paul Goepfert's group, we showed that cryptic epitopes are frequently targeted--especially in antisense reading frames (ie, the genome was transcribed "Backwards" on the 3' strand). Another great example of using HIV adaptation as a starting point for learning something fundamentally knew about how our immune systems interact with viruses. Will be interesting to see if these lead to new vaccine targets. These papers were picked up by several news aggregators and bloggers.
This paper is the introduction of the Phylogenetic Dependency Network framework. The idea is that we build independent models of evolution for each amino acid in an HIV protein. One of those models is parameterized by the phylogenetic structure, the rate of evolution in the absence of escape, and a model of adaptation in the leaves of the phylogeny. Crucially, we assume that adaptation exists only in the leaves (ie, the observed patients). This is clearly wrong, but quite useful in that it keeps the number of parameters linear, and empirically it's a decent approximation, as we showed previously. This paper is the foundation of all of our HIV escape work and has been cited numerous times. The image on the left was used as the cover image for this article, as well as the PLoS T-shirt logo for the 2009 ISMB conference.