This tool models genetic diversity by summarizing a large input dataset into an epitome, a short sequence capturing many overlapping subsequences from the dataset.
For example, applying the tool to modeling the diversity of HIV, the epitome produces relatively small vaccine immunogens covering a large number of immune system targets known as epitopes. Our experiments have shown that the epitome includes more epitopes than other vaccine designs of similar length, including cocktails of consensus strains, phylogenetic tree centers, and observed strains.
The tool optimizes greedily, that is, it iteratively increases the length of the epitome by appending a patch (possibly with overlap) from the data which maximally reduces the ratio of the sum of the patch weights of the included patches to the length of the epitome. The process can be stopped once the desired length is achieved (rather than when the entire set of patches is included as in the superstring problem).
The tool accepts input in two formats.
First, a text table:
|
Patch |
Weight |
|
NKIVRMYSP |
167 |
|
LNKIVRMYS |
167 |
|
PQDLNTMLN |
166 |
|
QDLNTMLNT |
166 |
|
GATPQDLNT |
166 |
|
EGATPQDLN |
166 |
|
ATPQDLNTM |
165 |
|
TPQDLNTML |
165 |
Separate the columns with space, tab, or comma. The headers are required.
The first column contains patches for possible inclusion in the epitome. The second column gives their relative weights.
This format is easily created via a spreadsheet program such as Excel. Transfer data to the tool either with cut (cntl-C) and paste (cntl-V) or by saving the spreadsheet in text format and using the tool's "Upload File" button.
The tool also accepts a second format: free text without weights. For example,
|
Twinkle,
twinkle, little star; |
When "Show Only Last" is unchecked, the tool shows the sequence of epitomes created. This output is tab-delimited and suitable for cutting (cntl-A,cntl-C) and pasting (cntl-V) into a spreadsheet such as Excel.
|
Method |
AminoAcidLength |
numComponents |
coverage |
Vaccine |
|
Greedy |
9 |
1 |
0.125753 |
LNKIVRMYS |
|
Greedy |
10 |
1 |
0.251506 |
LNKIVRMYSP |
|
Greedy |
17 |
1 |
0.376506 |
PQDLNTMLNKIVRMYSP |
|
Greedy |
18 |
1 |
0.500753 |
TPQDLNTMLNKIVRMYSP |
|
Greedy |
19 |
1 |
0.625 |
ATPQDLNTMLNKIVRMYSP |
|
Greedy |
20 |
1 |
0.75 |
GATPQDLNTMLNKIVRMYSP |
|
Greedy |
21 |
1 |
0.875 |
EGATPQDLNTMLNKIVRMYSP |
|
Greedy |
30 |
2 |
1 |
EGATPQDLNTMLNKIVRMYSP,QDLNTMLNT |
|
Method |
AminoAcidLength |
numComponents |
coverage |
Vaccine |
|
Greedy |
7 |
1 |
0.3 |
TWINKLE |
|
Greedy |
7 |
1 |
0.3 |
TWINKLE |
|
Greedy |
10 |
2 |
0.4 |
TWINKLE,HOW |
|
Greedy |
13 |
2 |
0.5 |
WHATWINKLE,HOW |
|
Greedy |
12 |
1 |
0.5 |
HOWHATWINKLE |
|
Greedy |
15 |
2 |
0.6 |
HOWHATWINKLE,YOU |
|
Greedy |
18 |
3 |
0.7 |
HOWHATWINKLE,YOU,ARE |
|
Greedy |
20 |
3 |
0.8 |
HOWHATWINKLE,YOU,STARE |
|
Greedy |
26 |
4 |
0.9 |
HOWHATWINKLE,YOU,STARE,LITTLE |
|
Greedy |
32 |
5 |
1 |
HOWHATWINKLE,YOU,STARE,LITTLE,WONDER |