A graduate student’s computer graphics project to help oceanographers visually manage sensor data has evolved into a potentially game-changing approach to scientific workflow. Scientists at the University of Washington (UW) are working with Microsoft Research Connections to demonstrate how marrying visualization and workflow technologies can allow researchers to better manage, evaluate, and interact with even the most complex scientific datasets.
Making Scientific Data Easier to Access, Reproduce, and Share
Imagine an oceanographer looking at real-time data from a sensor that is observing the eruption of an undersea volcano, an ephemeral event. A quick scan of different depictions of the data—a bar chart here, a color-coded grid there—indicates to the scientist that something is amiss. She opens up a workflow tool, tweaks a parameter even as the information pours in, and restores the integrity of the valuable data.
“You used to get one shot at these kinds of things,” says Roger Barga, principal architect at Microsoft Research Connections and the leader of Project Trident: A Scientific Workflow Workbench, which aims to make complex scientific data visually manageable. The project uses a new tool being developed by Microsoft and UW researchers to allow scientists to “interact with the data in a very fluid way,” says Barga.
Project Trident is helping scientists to manage data-intensive projects such as the Ocean Observatories Initiative, which is creating cabled observatories off the U.S. coast. (Image courtesy of the Center for Environmental Visualization, University of Washington)
Project Trident had its beginnings in work undertaken by Keith Grochow, a UW doctoral student in computer science. Grochow was exploring the use of computer visualization to assist UW oceanographers who are part of a seafloor-based research network called the Ocean Observatories Initiative (OOI), also known as Neptune. Sponsored by the National Science Foundation, the US$400 million OOI will produce massive amounts of data from thousands of ocean-based sensors off the U.S. coast.
Roger BargaAt Microsoft, meanwhile, Barga was looking to expand the use of Windows Workflow Foundation, a workflow tool based on the Microsoft .NET Framework that had been largely focused on business applications. When Jim Gray, a visionary computer scientist at Microsoft Research, learned about Grochow’s work with the Neptune project, he connected Barga with Ed Lazowska and Mark Stoermer, Grochow’s Ph.D. supervisors at the University of Washington.
The resulting collaboration, which also involves the Monterey Bay Aquarium and other OOI participants, has focused on adapting Windows Workflow Foundation for scientific workflow purposes using Microsoft SQL Server 2008 and Windows HPC cluster technologies. The goal is to allow researchers to dive much deeper into large-scale projects and perform analyses of problems previously regarded as impenetrably complex. The new tool, which exploits the powerful graphics capabilities of today’s computers, will make scientific data not only more transparent, but also more easily reproducible and more easily shared.
Taking Advantage of Computer Gaming Visualization Advances
“The gaming industry has created this amazing graphics engine available on every PC, yet the resource has been largely overlooked by the scientific community,” says Grochow, whose doctoral thesis will be based on the project. Many scientific tasks that have required cumbersome text and formula entries can be more easily accomplished using the kinds of graphical tools that allow gamers to battle monsters or fly a virtual jet, he says.
The UW effort to visualize the Neptune data, called COVE, or Collaborative Observatory Visualization Environment, was running out of funding when the collaboration with Microsoft took hold. Microsoft’s financial and technical support has allowed COVE to thrive, says Stoermer, who is director of the UW’s Center for Environmental Visualization.
“COVE really is about taking a gaming perspective to research,” says Stoermer. “And in the long run, we see this as applicable well beyond oceanography.” The approach is already being adapted to study watershed runoff patterns around Puget Sound as part of an effort to protect the marine environment from pollutants, he says. Barga’s team at Microsoft is also collaborating with astronomers to ensure that the tool is broadly applicable across scientific disciplines.
Visualizing the Scientific Workflow
Most scientists haven’t thought much about workflow, says Barga, nor do they have much sense of what it means beyond “how work gets done.” For computer scientists, however, workflow refers to detailed code specifications for running and coordinating a sequence of events. It can be a simple, linear sequence—the classic flow chart, for example— or a conditional, many-branched series of events linked together and interacting within complex feedback loops.
Incorporating workflow visualization into science will make managing large-scale, complex projects easier, says Barga. It also has the potential to greatly improve the efficiency of research. Scientists currently spend a huge amount of time and resources attempting to validate or replicate other researchers’ findings. The workflow tool will capture every step or alteration in even the most complex study, he says, allowing any other researcher to check every data point or rerun the experiment “virtually,” with different parameters.
“This really is going to be a game-changer,” says Barga.
A Microsoft Research Connections-funded project supporting advanced technology research
Roger Barga, Principal Architect, Microsoft Research Connections
Mark Stoermer, director, Center for Environmental Visualization, University of Washington
- Keith Grochow, doctoral student, Computer Science and Engineering, University of Washington