Share this page
  • Share this page on Twitter Share this page on Facebook Share this page on Digg Share this page on Del.icio.us Read the Inside Microsoft Research blog
  • E-mail this page Print this page
  • RSS feeds
Home  > eScience  > Microsoft Computational Biology Web Tools  > Create Epitome  > Create Epitome Details
Epitome Creator Details

Overview

This tool models genetic diversity by summarizing a large input dataset into an epitome, a short sequence capturing many overlapping subsequences from the dataset.

For example, applying the tool to modeling the diversity of HIV, the epitome produces relatively small vaccine immunogens covering a large number of immune system targets known as epitopes. Our experiments have shown that the epitome includes more epitopes than other vaccine designs of similar length, including cocktails of consensus strains, phylogenetic tree centers, and observed strains.

The tool optimizes greedily, that is, it iteratively increases the length of the epitome by appending a patch (possibly with overlap) from the data which maximally reduces the ratio of the sum of the patch weights of the included patches to the length of the epitome. The process can be stopped once the desired length is achieved (rather than when the entire set of patches is included as in the superstring problem).

Input Format

The tool accepts input in two formats.

First, a text table:

Patch

Weight

NKIVRMYSP

167

LNKIVRMYS

167

PQDLNTMLN

166

QDLNTMLNT

166

GATPQDLNT

166

EGATPQDLN

166

ATPQDLNTM

165

TPQDLNTML

165

 

Separate the columns with space, tab, or comma. The headers are required.

The first column contains patches for possible inclusion in the epitome. The second column gives their relative weights.

This format is easily created via a spreadsheet program such as Excel. Transfer data to the tool either with cut (cntl-C) and paste (cntl-V) or by saving the spreadsheet in text format and using the tool's "Upload File" button.

The tool also accepts a second format: free text without weights. For example,

Twinkle, twinkle, little star;
How I wonder what you are.

 

Output Format

When "Show Only Last" is unchecked, the tool shows the sequence of epitomes created. This output is tab-delimited and suitable for cutting (cntl-A,cntl-C) and pasting (cntl-V) into a spreadsheet such as Excel.

Method

AminoAcidLength

numComponents

coverage

Vaccine

Greedy

9

1

0.125753

LNKIVRMYS

Greedy

10

1

0.251506

LNKIVRMYSP

Greedy

17

1

0.376506

PQDLNTMLNKIVRMYSP

Greedy

18

1

0.500753

TPQDLNTMLNKIVRMYSP

Greedy

19

1

0.625

ATPQDLNTMLNKIVRMYSP

Greedy

20

1

0.75

GATPQDLNTMLNKIVRMYSP

Greedy

21

1

0.875

EGATPQDLNTMLNKIVRMYSP

Greedy

30

2

1

EGATPQDLNTMLNKIVRMYSP,QDLNTMLNT

 

Method

AminoAcidLength

numComponents

coverage

Vaccine

Greedy

7

1

0.3

TWINKLE

Greedy

7

1

0.3

TWINKLE

Greedy

10

2

0.4

TWINKLE,HOW

Greedy

13

2

0.5

WHATWINKLE,HOW

Greedy

12

1

0.5

HOWHATWINKLE

Greedy

15

2

0.6

HOWHATWINKLE,YOU

Greedy

18

3

0.7

HOWHATWINKLE,YOU,ARE

Greedy

20

3

0.8

HOWHATWINKLE,YOU,STARE

Greedy

26

4

0.9

HOWHATWINKLE,YOU,STARE,LITTLE

Greedy

32

5

1

HOWHATWINKLE,YOU,STARE,LITTLE,WONDER