Identifying the effects of human immune pressure on the evolution of HIV.

An important open question in HIV vaccine design is how much of HIV’s mutation is a predictable response to the selection pressure imposed by the human immune system. If mutation is heavily influenced by immune pressure, following predictable escape routes, then we can then use these predictable escape routes to design a vaccine payload that pre-trains the immune system to recognize and block these paths. In this work, we concentrate on HIV’s response to selection pressure from the cellular arm of the immune system. Identification of predictable escape paths are facilitated by two facts. One, the HLA-I system is highly diverse-in particular, there are hundreds of A, B, and C HLA-I alleles; and each of us have at least one and at most two of each allele. Two, an epitope that can be presented by one HLA-I molecule can usually not be presented by another. Thus, if selection pressure from the cellular arm is substantial, the HIV escapes (point mutations) that we see in one individual are likely to be different from those we see in another individual. By identifying correlations between the presence or absence of particular HLA alleles and the presence or absence of particular HIV mutations in a population of infected people, we can identify the effects of immune pressure. Existing methods for identifying such correlations have shown that the affects of immune pressure exist but are not particularly strong. These methods, however, make the assumption that data (the observed sequences) are exchangeable. This assumption is a poor one, because the HIV sequences are related through their phylogeny (evolutionary tree). We have developed an approach that takes the phylogeny into account and have applied our approach to several real data sets. Using known epitopes as a benchmark, we find that our approach is significantly more accurate at distinguishing true from false correlations than previous methods that assume the data to be exchangeable. Furthermore, studies on synthetic data indicate that our approach is better calibrated-that is, it can estimate its false positive rate more accurately than previous methods. Most important, we use a combination of real and synthetic data to demonstrate that the amount of selection pressure from the cellular arm is much stronger than previously estimated.