|
Stuart Ozer Microsoft eScience Research Group Tel: 415-778-8235
I am a researcher in Microsoft Research’s San Francisco-based eScience group, working on problems of data organization and tools and algorithms for analysis in biological and health sciences. My efforts are focused on a set of collaborations with university and government research teams spanning RNA structural genomics, protein folding, clinical data warehousing, and environmental sensor networks. Among my goals are to make data-intensive science research more productive by fostering work with higher-level tools, and to creatively apply broadly-available data mining and visualization software in scientific settings. Collaboration
Highlights In collaboration with David Baker’s lab at the University of Washington, we created and deployed a system in 2006 that populates a database with key metrics from the Rosetta@Home community computing project predicting the 3-dimensional shape of proteins. The system uses SQL Server 2005 and Reporting Services to make experimental data available on-request – via graphical charts over the public internet -- to participants who have contributed their computer time. The facility is fully integrated with Rosetta’s BOINC-based service and web-presence architecture. Currently we are expanding the service to deliver a rich set of graphic presentations to researchers conducting experiments -- allowing them to visually track the effectiveness of new prediction algorithms and methods against prior results.
With Robin Gutell and the Gutell Lab at the University of Texas at Austin, we have architected a completely new approach to representing and mining aligned RNA sequence data for comparative analysis. We persist the individual nucleotides from every sequence in a database table, joined to both phylogenetic information about the sequence and structural annotations about the nucleotide’s position. Algorithms that predict the secondary structure of the molecules, as well as queries that gather statistics on the composition of discovered structures, are implemented inside the database as .Net stored procedures – increasing the performance, scalability and flexibility of these analytical techniques several fold over prior methods that used flat files and Perl scripts. The ultimate goals of this collaboration are to (1) improve the prediction of an RNA’s secondary and tertiary structure, (2) better enable the reconstruction of phylogenetic relationships and (3) enhance the automation of the tools to discover and align new sequences in existing RNA families. An overview of the lab’s work and RNA data curating efforts can be found at the Comparative RNA Web site. With Jack Bates’ team at the US Veterans Health Administration (VHA), I am collaborating on the physical design of their Enterprise Data Warehouse (EDW) and models for its use in research settings. The EDW is a secure, anonymized multi-terabyte query environment evolving to contain all facets of historical medical records from the VHAs network of over 1200 clinics and hospitals. Among its many applications, the environment is actively used to improve the quality of patient care and study the trends in chronic conditions such as diabetes and obesity over time and under various therapies. Working with Katalin Szlavecz, Andreas Terzis and Alex Szalay of Johns Hopkins, and Jim Gray of Microsoft Research, I designed a data cube oriented to the unique problems of visualizing and exploring information from environmental sensor networks. Using the cube, we have been investigating the effectiveness of tools such as Excel, ProClarity and Tableau to support scientific queries and associated data analysis tasks. Papers 2007 The work we have done with Rosetta@home and the BOINC infrastructure, offering participants and researchers experimental result data on demand in community computing projects, is described in some detail in “Reporting@Home: Delivering Dynamic Graphical Feedback to Participants and Researchers in Community Computing Projects,” with David Kim and David Baker at UW. 2006 An overview of the physical and data architecture supporting
the sensor network at Johns Hopkins can be found in “ Life Under Your Feet: An End-to-End Soil Ecology Sensor
Network, Database, Web Server, and Analysis Service,” with Katalin
Szlavecz, Andreas Terzis, Razvan Musaloiu-E, Joshua Cogan, Sam Small, Randal
Burns, and Alex Szalay of JHU and A shorter presentation of the above work, with focus and additional detail on data cube design and usage is available in “Using Data-Cubes in Science: an Example from Environmental Monitoring of the Soil Ecosystem,” with Katalin Szlavecz, Andreas Terzis, Razvan Musaloiu-E, Joshua Cogan and Alex Szalay of JHU I am an occasional contributor to a SQL Server Customer Advisory Team blog, which has interesting reading for anyone building large scale SQL systems. Brief Bio I joined Microsoft Research in April 2006. Previously, I was a member and manager of the SQL Server Customer Advisory Team, an organization within the SQL Server product group dedicated to architecting and learning from the largest scale deployments of the database system worldwide. When I first joined Microsoft in 1997, I managed a team responsible for creating and operating a multi-terabyte clickstream data warehouse, and associated decision support applications, in support of online services initiatives at the company. Prior to Microsoft, I held software product planning and program management positions oriented to large databases and products for their analysis – at companies including Metaphor Computer Systems and IBM, and in the mid 90’s I co-founded the data warehousing consultancy Infodynamics in the SF Bay Area. |