FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a program for performing genome-wide association studies (GWAS) on large data sets.
Open source Python code for benchmarking and evaluating GWAS algorithms.
An open source Python library for reading and manipulating genetic data. It can, for example, efficiently read whole PLINK *.bed/bim/fam files or parts of those files. It can also efficiently manipulate ranges of integers using set operators such as union, intersection, and difference.
Prediction of Cytosolic Stability of HIV-Derived Peptides
Results of SNP-pair epistasis GWAS (genome-wide association study) on the Wellcome Trust data. P values are based on a likelihood ratio test comparing the likelihood with a multiplicative term versus that for an additive linear model. 63.5 billion SNP pairs were evaluated for seven common diseases (type I diabetes, type II diabetes, coronary artery disease, hypertension, rheumatoid arthritis, Crohns disease, and bipolar disorder). If you need more results than can be provided through this link, please email a request to firstname.lastname@example.org, email@example.com, and firstname.lastname@example.org.
This study makes use of data generated by the Wellcome Trust Case-Control Consortium. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113 and 085475.
Contig ploidy and allele dosage estimation ConPADE
Pediction tool related to: E. Lazaro, C. Kadie, P. Stamegna, S. C. Zhang, P. Gourdain, N. Y. Lai, M. Zhang, S. A. Martinez, D. Heckerman, S. Le Gall. Variable HIV Peptide Stability in Human Cytosol Is Critical to Epitope Presentation and Immune Escape, J. Clin. Invest. 2011. The Stability Prediction tool takes a list of peptides as input (8-11 amino acids long peptides; HIV-derived peptides). The stability rate is calculated using non-linear regression (one-phase exponential decay) of the degradation profile over 30 minutes for an average of 3 to 5 degradation experiments in cytosolic extracts from human primary cells (peripheral blood mononuclear cells) of different donors.
Correction for Hidden Confounders in the Genetic Analysis of Gene Expression
ConPADE is a tool for contig ploidy estimation for genome assemblies of complex polyploid plant genomes from whole genome shotgun sequencing data. It also calls SNPs and provides estimates of allele dosages.
Software and associated materials to accompany the following paper: Correction for Hidden Confounders in the Genetic Analysis of Gene Expression, Jennifer Listgarten, Carl Kadie, Eric Schadt, David Heckerman, Proceedings of the National Academy of Sciences, in press. This software has two main functions (1) to perform association scans (e.g. Genome Wide Association Scans) using a linear mixed model, and (2) to learn an appropriate Expression Heterogeneity kernel for use with Linear Mixed Models when looking for associations with gene expression data as the target.
Pathogens live and reproduce inside the human host, whose immune system continually tries to rid the body of these pathogens. This leads to a tug-of-war between the pathogen and the human host, where the pathogen tries to adapt so as to "escape" the immune system, while the immune system learns to recognize and eliminate new foreign pathogens. A set of key players for the immune system are the HLA proteins, each of which can recognize specific short fragments of foreign (e.g. HIV) proteins or epitopes in infected cells and then alert the immune system to their presence. For rapidly evolving pathogens like HIV, a key defense mechanism is to evolve mutations that prevent the HLA proteins from recognizing the viral DNA. This evolution takes place anew in each patient, as each patient has a different set of HLA proteins that recognize different epitopes. PhyloD is a suite of statistical tools that can identify HIV mutations that defeat the function of the HLA proteins in certain patients, thereby allowing the virus to escape elimination by the immune system. By applying this tool to large studies of infected patients, researchers are now able to start decoding the complex rules that govern the HIV mutations, in the hope of one day creating a vaccine to which the virus is unable to develop resistance. See also our GitHub page.
This tool computes the probability that a given kmer is a T-cell epitope restricted to a given HLA allele. The tool can scan for 8, 9, 10, and 11mer epitopes and over all common HLA alleles.
HLA sequence typing sometimes yields uncertain results. For example, an allele may be identified as A6801/6802 or simply A02. This tool takes the uncertain information, and (probabilistically) expands it to four digit alleles, making use of linkage disequilibrium to inform the expansion.
One way to find epitopes is to do lab studies such as ELISPOT. One problem with this approach is that, if you see a reaction in a patient, you do not know which HLA genes of the patient is responsible for the reaction. This tool takes lab data from a series of patients and determines (probabilistically) which HLA genes are responsible for the reaction.
False Discovery Rate for 2X2 Contingency Tables
This tool takes as input, a weighted list of amino acid sequences. It creates epitomes of all lengths.
Fisher Exact Test of Independence for 2X2 Contingency Tables
False discovery rate (FDR) estimates the proportion of false positives among those tests that are deemed significant. This tool computes FDR for 2x2 contingency tables based on Fisher statistic.
Fisher exact test is a statistical significance test for categorical data, measuring the association between two variables in a 2X2 contingency table.