Share this page
    Project Tuva Enhanced Video Player
    Project Tuva Enhanced Video Player
    Share this page E-mail this page Print this page RSS feeds
    Home  > eScience  > Microsoft Computational Biology Web Tools  > Create Epitome  > Create Epitome Details
    Epitome Creator Details

    Overview

    This tool models genetic diversity by summarizing a large input dataset into an epitome, a short sequence capturing many overlapping subsequences from the dataset.

    For example, applying the tool to modeling the diversity of HIV, the epitome produces relatively small vaccine immunogens covering a large number of immune system targets known as epitopes. Our experiments have shown that the epitome includes more epitopes than other vaccine designs of similar length, including cocktails of consensus strains, phylogenetic tree centers, and observed strains.

    The tool optimizes greedily, that is, it iteratively increases the length of the epitome by appending a patch (possibly with overlap) from the data which maximally reduces the ratio of the sum of the patch weights of the included patches to the length of the epitome. The process can be stopped once the desired length is achieved (rather than when the entire set of patches is included as in the superstring problem).

    Input Format

    The tool accepts input in two formats.

    First, a text table:

    Patch

    Weight

    NKIVRMYSP

    167

    LNKIVRMYS

    167

    PQDLNTMLN

    166

    QDLNTMLNT

    166

    GATPQDLNT

    166

    EGATPQDLN

    166

    ATPQDLNTM

    165

    TPQDLNTML

    165

     

    Separate the columns with space, tab, or comma. The headers are required.

    The first column contains patches for possible inclusion in the epitome. The second column gives their relative weights.

    This format is easily created via a spreadsheet program such as Excel. Transfer data to the tool either with cut (cntl-C) and paste (cntl-V) or by saving the spreadsheet in text format and using the tool's "Upload File" button.

    The tool also accepts a second format: free text without weights. For example,

    Twinkle, twinkle, little star;
    How I wonder what you are.

     

    Output Format

    When "Show Only Last" is unchecked, the tool shows the sequence of epitomes created. This output is tab-delimited and suitable for cutting (cntl-A,cntl-C) and pasting (cntl-V) into a spreadsheet such as Excel.

    Method

    AminoAcidLength

    numComponents

    coverage

    Vaccine

    Greedy

    9

    1

    0.125753

    LNKIVRMYS

    Greedy

    10

    1

    0.251506

    LNKIVRMYSP

    Greedy

    17

    1

    0.376506

    PQDLNTMLNKIVRMYSP

    Greedy

    18

    1

    0.500753

    TPQDLNTMLNKIVRMYSP

    Greedy

    19

    1

    0.625

    ATPQDLNTMLNKIVRMYSP

    Greedy

    20

    1

    0.75

    GATPQDLNTMLNKIVRMYSP

    Greedy

    21

    1

    0.875

    EGATPQDLNTMLNKIVRMYSP

    Greedy

    30

    2

    1

    EGATPQDLNTMLNKIVRMYSP,QDLNTMLNT

     

    Method

    AminoAcidLength

    numComponents

    coverage

    Vaccine

    Greedy

    7

    1

    0.3

    TWINKLE

    Greedy

    7

    1

    0.3

    TWINKLE

    Greedy

    10

    2

    0.4

    TWINKLE,HOW

    Greedy

    13

    2

    0.5

    WHATWINKLE,HOW

    Greedy

    12

    1

    0.5

    HOWHATWINKLE

    Greedy

    15

    2

    0.6

    HOWHATWINKLE,YOU

    Greedy

    18

    3

    0.7

    HOWHATWINKLE,YOU,ARE

    Greedy

    20

    3

    0.8

    HOWHATWINKLE,YOU,STARE

    Greedy

    26

    4

    0.9

    HOWHATWINKLE,YOU,STARE,LITTLE

    Greedy

    32

    5

    1

    HOWHATWINKLE,YOU,STARE,LITTLE,WONDER