Share this page
Share this page E-mail this page Print this page RSS feeds
Home > People > Yogesh Simmhan
Yogesh Simmhan

Yogesh Simmhan mugshotPOSTDOC RESEARCHER

eScience Group

Microsoft External Research 

eScience Group 

Los Angeles Office
1100 Glendon Ave, Suite 1080,
Los Angeles CA 90024 [Map]
Redmond Office
B99/4611, 14820 NE 36th St, 
Redmond, WA 98052 [Map]

+1 (540) 449-4770 [Cell]
+1 (425) 538-6245 [Work/FAX]
+1 (425) 704-8891 [Redmond]

yoges@microsoft.com  [tag]

Background

I am interested in end-to-end management of data in distributed platforms such as clusters, Clouds and Grids. Data management is particularly challenging in data intensive sciences, where querying metadata, managing data repositories, tracking provenance, and archiving experimental results are important to the scientific process. I work on scientific workflow frameworks that address some of the above challenges and provide a platform for in silico experiments. Clouds are emerging as a feasible alternative for scalable scientific analyses and I am exploring the scope of these eScience applications and the middleware to support them. Applying these tools to support science is a key goal, and I work/have worked with scientists in meteorology, astronomy and genomics domains in this pursuit. 

Recent Publications (Peer Reviewed)

    2009

    2008

    All publications...

    Selected Talks

    Current & Recent Projects

    BioInformatics & Cloud Computing

    Coming soon...Microsoft Biology Initiative; Windows Azure; DryadLINQ

    Pan-STARRS

    The Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) is the next generation of digital sky surveys that builds on the success of the Sloan Digital Sky Survey (SDSS). Equipped with the world’s largest digital camera, this next generation system leverages SQL Server 2008, Windows HPC Clusters, Windows Workflow Foundation and the Trident Scientific Workbench to handle the much larger data generated by Pan-STARRS (30TB/year) and the need to make that data available to astronomers promptly (incrementally updated each week).

    This project is in collaboration with Alex Szalay of Johns Hopkins University and Jim Heasley of University of Hawai’i. I was actively involved in the project in incorporating scientific workflows to reliably automate the data pipeline that continuously brings processed telescope detections into databases that are science ready.

    Trident Scientific Workflow Workbench

    The Trident Workbench provides a rich set of tools to run scientific workflows in the Cloud. Built on top of the Windows Workflow Foundation runtime, Trident adds tools such as a visual workflow composer, service registry, provenance tracking and integration with Windows HPC scheduler that make it an effective workbench for eScience in the Cloud. Originally designed for the NEPTUNE Oceanography project, Trident is now being generalized to other scientific domains and being used in the Pan-STARRS project.

    The Trident project was lead by Roger Barga at Microsoft Research.

    New: Download and try the Project Trident CTP !

    Karma3 Provenance Framework

    The Karma provenance framework was initiated as part of my Ph.D. research to build an effective and light-weight provenance collection system for scientific workflows and applied to the LEAD meteorology project. Development on Karma v3 continues, funded by an NSF SDCI grant to make Karma general purpose and to use the provenance captured to automate workflow composition. This work will also make Karma compatible with the emerging Open Provenance Model specification.

    This project is in collaboration with Beth Plale and David Leake of Indiana University and Dennis Gannon of Microsoft Research. I am a Co-PI on the NSF SDCI award.

    Semantic Provenance in Life Sciences Grid

    The Life Science Grid (LSG) is an open-source plugin framework from Eli Lilly that allows researchers in the Life Science domain to use information services, encapsulated as plugins, in a collaborative manner to perform scientific research and discovery. This project extends the capabilities of LSG by capturing semantic provenance on user interactions with the information sources through LSG that helps track research direction, helps collaborative research and presents a rich source for data mining. The project uses Karma for the provenance capture and S-OGSA for semantic annotations and querying.

    This project was in collaboration with Beth Plale of Indiana University and Carole Goble of University of Manchester, and sponsored by Eli Lilly Pharmaceuticals.

    Service

    Recent Service

    Past Interns

    Awards

    SuperComputing 2008 Storage Challenge PhotoSupercomputing 2008 Storage Challenge WinnerGrayWulf: Scalable Cluster Architecture for Data Intensive Computing. Alexander Szalay, Maria Nieto-Santisteban, Jan Vandenberg, Alainna Wonders, Randal Burns, Eric Perlman, Ani Thakar, Mike McCarty and Dean Zariello (Johns Hopkins University); Gordon Bell, Tony Hey, Roger Barga, Yogesh Simmhan and Catherine van Ingen (Microsoft Research); and Michael Thomassy and Lubor Kollar (Microsoft Corporation); Robert Grossman, David Hanley, Yunhong Gu and Michael Sabala (University of Illinois at Chicago); Jim Heasley (University of Hawaii); and Tim Carrol, Eric Barnes and Mike Rowland (Dell, Inc.)