Speaker Mehmet Dalkilic
Affiliation Indiana University
Host Dan Fay
Date recorded 6 October 2005
One of the greatest benefits of escience—the use of distributed computing and data resources for scientific discovery—is the opportunity for scientists to begin working with data sets that would have been too large to work with otherwise and, consequently, ask questions that would have not been possible. There are many obvious challenges escience faces because of its distributed nature, but other challenges that, while not uniquely escientific, remain sufficiently domain-sensitive that solutions do not seem easily shareable. One particularly difficult problem is integration—how to coherently bring together disparate, massive data sets. Focus has been generally placed on the physical layer, borrowing from the three layers of data modeling, where details of implementation predominate. This problem will likely continue, though there is some hope leveraging “smart” architectures like smart clients. Logical integration—how to meaningfully bring together massive, disparate data sets—from the scientists’ perspective is even more challenging. Another challenge of escience is creating meaningful, interactive visualizations of massive data sets. A direct benefit of this kind of visualization is allowing the scientist to freely explore in a setting that is more familiar and intuitive. In this presentation will we discuss three ongoing projects, CATPA (Curation and Alignment Tool for Protein Analysis), INGeNE (Integrated, Gene Network Explorer), and SNPEx (SNP Explorer) that address the challenges of integration and visualization. CATPA is a smart client application that allows for the curation of protein families at the residue level, including deletions. Interaction is done visually. INGeNE is an application that allows for functional genomic discovery by building networks of relationships where an edge is a determined by a combination of microarray data, protein-protein data, gene-gene interaction data, and phenotypic expression data. SNPEx is an application that includes a novel algorithm to find the most informative set of tagging SNPs. Additionally, we decided to implement SNPEx in both Java/MySQL and C#/SQLServer 2000 to compare performance of the two systems and found the later to be superior in our suite of tests.
©2005 Microsoft Corporation. All rights reserved.