External Research: Computational Challenges of Genome Wide Association Studies (GWAS) Awards

Computational Challenges of Genome Wide Association Studies (GWAS) Awards

Genome-wide association studies (GWAS) aim to correlate patterns of genomic variation with phenotypes such as responses to drugs, disease, aging, or the expression of genetic disorders. The National Institutes of Health see GWAS as "laying the groundwork for an era of personalized medicine." Microsoft Research created this project to aid researchers, scientists, and clinicians in this endeavor.

Combining clinical and phenotype data with enabling solutions in this area will ultimately help in the prediction of disease and patient care—in particular, for common diseases for which there is a genetic predisposition such as diabetes, asthma, cancer, and heart disease. We are honored to support continued research in an area that is likely to have broad impact on the future of worldwide health.

Computational Challenges of Genome Wide Association Studies (GWAS) Award Recipients

PGRx: An Interactive Software System for Integrating Clinical Genotyping with Prescription Drug Safety Assurance
Michael Kane (left), Purdue University, United States
John Springer (right), Purdue University, United States

This proposal aims to develop a software and data management system (PGRx system) that utilizes patient-specific genotyping to predict and prevent adverse drug responses, and which supports the prescription drug process from physician to pharmacist to consumer. More specifically, the PGRx system will use specific allelic variables associated with drug metabolism, as well as other common laboratory tests, to identify patients who are predisposed to an adverse drug reaction, and to make recommendations as to the best course of action for a particular drug and patient. The PGRx system will also provide in-depth training for the user (physician or pharmacist) to better understand the link between DNA (genes), drug metabolism (enzymes), and the risk of adverse drug responses with specific prescription medicines.

A Universal Data Format for Genotype Microarrays
John Pearson
Translational Genomics Research Institute, United States

We propose to address the practical difficulties of combining data generated on genotype microarrays from different vendors by the creation of a universal data format (UDF). This format would accommodate intensity-level data (not genotypes) from multiple vendor platforms in a single file, and would use a C++ software library to implement the UDF. The initial design of such a format has been completed as part of an ongoing NIH-funded grant under the Enhancing Development of Genome-wide Association Methods (ENDGAME) Consortium. Adoption of the UDF as a data output format by genotype microarray vendors would simplify analysis and allow analysis tool developers to develop input routines for a single format.

Genome-wide Association Study of Amyotrophic Lateral Sclerosis in Finland
Bryan Traynor
National Institutes of Health; Johns Hopkins Hospital, United States

The overall purpose of this project is to discover the genes that are relevant to the pathogenesis of Amyotrophic lateral sclerosis (ALS), a rapidly progressive, fatal neurodegenerative disorder, by undertaking a genome-wide association of 489 Finnish ALS cases and 515 Finnish controls using Illumina BeadChips. Raw genotyping data will be made publicly available, so that other researchers may data mine the dataset. It is envisaged that the data generated by this project will significantly advance the field of ALS genetics by identifying genes involved in motor neuron degeneration of sporadic ALS. Identification of the sporadic ALS genetics may yield a new array of drugable targets and may facilitate the development of new therapeutic agents effective in slowing ALS disease progression. A more comprehensive understanding of ALS biology may also provide insight into the pathogenesis of other neurodegenerative diseases such as Parkinson�s disease and Alzheimer�s disease.

Pathway-based Association: A New Paradigm for Genome-wide Association Studies
Trey Ideker (left), University of California, San Diego, United States
Richard Karp (right), University of California, Berkeley, United States

Our project will be to explain the associations captured by GWAS in terms of known gene and protein interactions. We will develop computational tools that query these independent networks to identify pathways and sub-networks of interactions underlying the observed set of genome-wide associations. This framework is intended to improve the power of current GWAS, by identifying genes in loci with borderline significance that nonetheless have close network proximity to significant genes. Furthermore, it will provide a list of putative physical pathways incorporating the causal genes necessary to affect the phenotype. If it proves successful, network and pathway based analysis will be directly applicable to the interpretation of genotypes for personalized medicine.

Data Quality Management for Model Improvement in GWAS>Raul Ruggia, University of the Republic of Uruguay
Hugo Naya, Pasteur Institute at Montevideo, Uruguay

This project addresses the problems of building a data quality management environment for the biological area, which would enable the user to define and evaluate biological-oriented data quality properties over specific data sources. The biological-oriented properties would be defined in terms of the basic ones, and the environment would use the existing techniques that manage basic quality properties. The main expected outcomes will consist of biological-oriented data quality properties and a prototyped environment to manage and to evaluate these quality properties on biological databases. In this project, we will focus on quality properties related to Metadata descriptions of biological data sources.

Phenotypic Pipeline for Genome-wide Association Studies
George Hripcsak
Columbia University, United States

Large-scale studies involving many subjects, or even smaller studies in which subjects are selected from a larger population, will require innovative means to extract a reliable, useful phenotype from electronic health records data. We propose to create a phenotypic framework with a pipeline architecture that uses advanced informatics methods to convert raw health records data into a usable phenotype. We will publish the framework and evaluation methods, we make the new components available, and we will demonstrate the system on a beta2-adenergic receptor genotype cohort, extracting the phenotype from data.