Darko Kirovski, David Heckerman, and Nebojsa Jojic
Recently Nickle et al. introduced a new model of genetic diversity that summarizes a large input dataset into a short sequence containing overlapping subsequences from the dataset. This model has direct applications to rational vaccine design. In this paper we formally investigate the combinatorics of the vaccine optimization problem. Here the vaccine is constructed as a sequence S of amino-acids such that as many of the most frequently occurring epitopes found in mutated viruses are subsequences to S. We rigorously present the related design optimization problem, establish its complexity, and present a simple probabilistic algorithm to find an efficient solution. Our vaccine designs show improvement of over 20% in the coverage score over the previously best designs and produce over 15% shorter vaccines that achieve equivalent epitope coverage.