Share this page
Share this page E-mail this page Print this page RSS feeds
Home > People > Yogesh Simmhan
Yogesh Simmhan

Yogesh Simmhan

POSTDOC RESEARCHER

eScience Group

Microsoft Research eScience Group

Los Angeles Office
1100 Glendon Ave, Suite 1080,
Los Angeles CA 90024 [Map]
Redmond Office
B99/4611, 14820 NE 36th St, 
Redmond, WA 98052 [Map]

+1 (540) 449-4770 [Cell]
+1 (425) 538-6245 [Work/FAX]
+1 (425) 704-8891 [Redmond]yoges@microsoft.com

Background

I am interested in end-to-end management of data in distributed systems such as Grids and Clouds. These data management problems are particularly challenging when dealing with scientific datasets, where managing the metadata, searching and discovering data, tracking data provenance, evaluating data quality and archiving experimental results are of particular importance. I am also interested in scientific workflow frameworks that are becoming standard actors that consume and generate scientific data, and help model complex, data intensive scientific experiments in silico that can be executed in the Cloud.

Research Interests

Data Provenance

Scientific Workflows

Recent Publications (Peer Reviewed)

All publications...

Selected Talks

On End-to-End Scientific Data Management using Workflows. Invited talk at the Scientific Workflows Workshop, 2008.

Transforming Scientific Research through Cloud Technology. Talk at the Indian Institute of Science, Bangalore, 2008.

Cloud Computing: A Technical Overview. Tutorial at the MSR eScience Workshop, Indianapolis, 2008. [Slides] [C# Code] [Java Code]

Awards

Supercomputing 2008 Storage Challenge WinnerGrayWulf: Scalable Cluster Architecture for Data Intensive Computing. Alexander Szalay, Maria Nieto-Santisteban, Jan Vandenberg, Alainna Wonders, Randal Burns, Eric Perlman, Ani Thakar, Mike McCarty and Dean Zariello (Johns Hopkins University); Gordon Bell, Tony Hey, Roger Barga, Yogesh Simmhan and Catherine Van Ingen (Microsoft Research); and Michael Thomassy and Lubor Kollar (Microsoft Corporation); Robert Grossman, David Hanley, Yunhong Gu and Michael Sabala (University of Illinois at Chicago); Jim Heasley (University of Hawaii); and Tim Carrol, Eric Barnes and Mike Rowland (Dell, Inc.)

Service

Provenance Challenge Workshop, 2009. Organizing Committee member. 

Workshop on Semantic Web and Provenance Management, 2009. Program Committee member.

Scientific Workflow Workshop, 2007-09. Program Committee member.

 

Current Projects

Pan-STARRS

The Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) is the next generation of digital sky surveys that builds on the success of the Sloan Digital Sky Survey (SDSS). Equipped with the world’s largest digital camera, this next generation system leverages SQL Server 2008, Windows Workflow Foundation and the Trident Scientific Workbench to handle the much larger data generated by Pan-STARRS (30TB/year) and the need to make that data available to astronomers promptly (incrementally updated each week).

This project is in collaboration with Alex Szalay of Johns Hopkins University and Jim Heasley of University of Hawai’i. I am actively involved in the project in incorporating scientific workflows to automate the data pipeline that continuously brings processed telescope detections into databases that are science ready.

Trident Scientific Workflow Workbench

The Trident Workbench provides a rich set of tools to run scientific workflows in the Cloud. Built on top of the Windows Workflow Foundation runtime, Trident adds tools such as a visual workflow composer, service registry, provenance tracking and integration with Windows HPC scheduler that make it an effective workbench for eScience in the Cloud. Originally designed for the NEPTUNE Oceanography project, Trident is now being generalized to other scientific domains and being used in the Pan-STARRS project.

The Trident project is lead by Roger Barga at Microsoft Research. I am working on the provenance collection aspects within Trident and in driving the design of the framework through its application in Pan-STARRS.

Karma3 Provenance Framework

The Karma provenance framework was initiated as part of my Ph.D. research to build an effective and light-weight provenance collection system for scientific workflows and it was applied to the LEAD meteorology project. Development on Karma continues, funded by an NSF SDCI grant to make Karma general purpose and to use the provenance captured to automate workflow composition. This work will also make Karma compatible with the emerging Open Provenance Model specification.

This project is in collaboration with Beth Plale and David Leake of Indiana University and Dennis Gannon of Microsoft Research. I am a Co-PI on the NSF SDCI award.

Semantic Provenance in Life Sciences Grid

The Life Science Grid (LSG) is an open-source plugin framework from Eli Lilly that allows researchers in the Life Science domain to use information services, encapsulated as plugins, in a collaborative manner to perform scientific research and discovery. This project extends the capabilities of LSG by capturing semantic provenance on user interactions with the information sources through LSG that helps track research direction, helps collaborative research and presents a rich source for data mining. The project uses Karma for the provenance capture and S-OGSA for semantic annotations and querying.

This project is in collaboration with Beth Plale of Indiana University and Carole Goble of University of Manchester, and sponsored by Eli Lilly Pharmaceuticals.