Jonathan M. Carlson
Jonathan M. Carlson, Ph.D.
One Microsoft Way
Redmond, WA 98052
United States of America
I am a Researcher in the eScience Research group at Microsoft Research. I am interested in the development and application of statistical and machine learning methods to biological data, working in close collaboration with biological and medical researchers. I am especially interested in applications in immunology, virology and genetics, as well as the development of more general techniques that transcend specific disciplines. My current primary focus is to develop statistical models that help us understand HIVís adaptive response to the immune system. Although the ability of HIV to mutate and adapt is one of the primary challenges for vaccine and drug design, it also poses an interesting opportunity for basic research: because HIV rapidly adapts changing environments, we can study HIV adaptation to learn about the workings of the virus and the immune system, furthering our general understanding of immunology and virology while working toward finding exploitable weaknesses to leverage in a vaccine.
I joined the Escience group at Microsoft as a researcher in 2008, and have been focusing on applied machine learning and statistics for HIV ever since. My approach is fundamentally interdisciplinary, as I seek to use computational tools to gain real biological insight and always work in close collaboration with biologists. My models of viral escape have achieved broad recognition in the HIV community, where they have led to the discovery of novel viral-host interactions, insights into mechanisms of natural immune control, and the identification of vaccine candidates that are moving toward clinical trials. I have co-authored over 60 papers in the field with dozens of labs, including publications in Science and Nature, and have served on advisory panels and committees for the Institutes of Medicine, the Gates Foundation and the Center for HIV/AIDS Vaccine Immunology. I received my BA in 2003 from Dartmouth, where I was awarded the top senior thesis prizes in both biology and computer science, and my Ph.D in computer science in 2009 from the University of Washington, where I was awarded the universityís distinguished dissertation award and was a finalist for the US Council of Graduate Schoolís dissertation award.
HIV is characterized by a tremendous rate of mutation, that leads to a high level of genetic diversity within and among patients. Yet transmission is frequently (~90%) established by a single genetic varient. What (if anything) is so special about that "founder" virus? In close collaboration with Eric Hunter's group at Emory, we showed in Science today, which featured an excellent focus on HIV, that founder viruses are characterized by a higher level of predicted fitness. Strikingly, this fitness bottleneck is stronger in men than in women,consistent with other data showing women are more easily infected than men--unless the man had genital inflammation. As noted by Sarah Joseph and Ron Swanstrom in their perspective piece, this means that exposure frequently results in the non-productive infection of target cells, providing a window in which stronger viruses can outcompete weaker viruses, as well as a window in which vaccines or drugs may be able to raise the bar for any virus to be able to establish systemic infection. It brings up an interesting and important paradox: Individuals with higher biological risk are more likely to be infected; but when they are infected, itís more likely to be by a weaker virus, leading to less severe disease. Among other things, this predicts that, as the standard of care moves toward preventative treatment of high risk individuals, a side effect of reduced transmission rates will be an increase in the severity of disease when breakthrough infection does occur, something we'll have to keep a close eye on.
This work was the result of a truly collaborative effort. The primary data came from Susan Allen's Zambian discordant couple cohort, a visionary cohort established two decades ago that enrolls, counsels, and provides condoms to cohabitating, heterosexual couples in which one individual is HIV+ and one is HIV-. Eric's group, in collaboration with many folks from UAB, have been looking at the genetics and molecular virology of transmission for a number of years. To this great team, we at MSR brought a big data, statistical perspective. In this case, no new methods were needed: we just needed to cleverly apply the established approach of generalized linear mixed models to tease out effects that had been hiding in the data all this time.
This story has been picked up by the blogsphere, with independent stories by NPR, Healthline, Emory, DW [German], Guokr [Chinese], and SINC (Spanish). If you don't have access to Science, you can read the authors' version (pdf), which is the accepted version of the paper prior to editorial copyediting and layout. Update: Science has a great program where any visitor to the corresponding author's website can access the official papers without going through their paywall. So here you go! abstract, full text, reprint (pdf)
In collaboration with Mary Carrington's group, we showed in Science that the quantity of HLA-C surface protein correlates with HIV disease progression, the probability than HLA-C epitopes will be targeted by the immune system, and the probability that HIV will escape within those epitopes. Mary further showed that HLA-C expression levels are linked to some auto-immune diseases. This is an important study that highlights the the role of HLA-C (which is generally ignored in the field) and demonstrates how our models of selection can be used to generate and test hypotheses. As always, this was a hugely collaborative effort, making key use of data from Philip Goulder, Zabrina Brumme and many others. The MSR Connections team wrote a blog post providing a nice high-level description.
As part of the IHAC collaboration, we published the largest HIV escape study ever done today. We studied the full HIV proteomes from 1,888 chronically HIV clade B-infected individuals who had never been given drugs to identify HLA escape mutations. These data will be a major resource for the community moving forward as they update the current standard we had published by adding more individuals, covering the full proteome, and developing new methods. But we also show some important new insights int HIV escape. For example, we now know that escape typically happen at anchor residues and that a hallmark of protective HLA alleles it the ability to drive escape across the proteome, especially at anchors. What's special about anchors? Primarily, once HIV mutates there, it is unlikely that the TcR will be able to adapt. Why would good HLAs drive escape at anchors? Presumably it's a marker of a very broad TcR response, such that anchors are the only viable escape option. As always, this was a broadly collaborative effort, driven by Zabrina Brumme and myself, in collaboration with Chanson Brumme, Richard Harrigan, Simon Mallal and Mina John. Amusingly enough, a sytlized version of a supplemental figure ended up on the cover of the January 2013 issue.
This paper marks the development and introduction of our phylogenetically corrected logistic regression algorithm. This allows us to do all the standard logistic regression analyses--test for differential effects or measure effect size--as logistic regression, but do it while correcting for phylogenetic structure. You can use the tool yourself here, though we have to limit to single analyses. If you'd like an executable version of the code, email me. We're working on a better, scalable solution, so stay tuned.
We used this approach to look at an interesting phenomenon: although we like to group HLA alleles by the their tendency to bind similar epitopes, we find that, in vivo, the escapes that evolution selects for differ by HLA. For example, when B*57:03 and B*57:02 (two very similar HLA alleles) present the same epitope, the observed escape mutations are usually different. Very surprising indeed, as it forces us to think more carefully about how (and if) we group alleles, as well as what the role of differential escape is. This work was in collaboration with Philip Goulder, John Frater, Roger Shapiro and Thumbi Ndung'u and thier labs.
Teaming up with Galit Alter and Marcus Altfeld from the Ragon Institute, we showed that HIV is adapting to the NK-cell-mediated immune response. We used PhyloD to identify HIV polymorphisms that are enriched among patients who express certain KIR genes. These associations imply that HIV is adapting to something specific in these individuals. In fact, NK-cells are activated and inhibited by their KIR proteins, which bind to HLA-epitope complexes. It looks like HIV mutates to manipulate these interactions, effectively shutting down the Natural Killer cells. What a great example of how we can start from adaptation, then work backward to figure out what's going on!
From time to time, the protein traslational machinery gets messed up, slipping a bit so that translation happens out of frame. The result is of course disfunctional protein fragments, which quickly get chewed up by the proteosome. We wondered if some of these were be presented as epitopes. If so, then HIV would of course escape (it always does!), and we would thus be able to find them by looking for HLA-mediated escape in non-primary reading frames. In back to back papers in the Journal of Experimental Medicine, we showed that this is indeed what's happening. In collaboration with Christian Brander's group, the first paper did a deep dive into one epitope, showing that the "cryptic epitope" was expressed was by translation at an alternative start size that would normally encode a lysine. In an independent paper with Paul Goepfert's group, we showed that cryptic epitopes are frequently targeted--especially in antisense reading frames (ie, the genome was transcribed "Backwards" on the 3' strand). Another great example of using HIV adaptation as a starting point for learning something fundamentally knew about how our immune systems interact with viruses. Will be interesting to see if these lead to new vaccine targets. These papers were picked up by several news aggregators and bloggers.
This paper is the introduction of the Phylogenetic Dependency Network framework. The idea is that we build independent models of evolution for each amino acid in an HIV protein. One of those models is parameterized by the phylogenetic structure, the rate of evolution in the absence of escape, and a model of adaptation in the leaves of the phylogeny. Crucially, we assume that adaptation exists only in the leaves (ie, the observed patients). This is clearly wrong, but quite useful in that it keeps the number of parameters linear, and empirically it's a decent approximation, as we showed previously. This paper is the foundation of all of our HIV escape work and has been cited numerous times. The image on the left was used as the cover image for this article, as well as the PLoS T-shirt logo for the 2009 ISMB conference.