Finding and retrieving relevant data can be a daunting and tedious task for environmental scientists and engineers. Microsoft External Research is developing an online search engine called SciScope that will make the job easier. SciScope enables researchers to search multiple data repositories simultaneously and retrieve information in a consistent format.
Managing a Wealth of Environmental Data
Environmental scientists and engineers are awash in data. The Internet is providing access to ever-increasing amounts of historical data. Meanwhile, more data continues to pour in, thanks in part to advances in satellite technology and inexpensive in-situ sensing devices.
While this deluge of information is opening new possibilities for scientific discovery, the diversity of data and data sources also poses challenges for scientists and engineers. For instance, a scientist trying to assemble a dataset for a particular research project typically has to go through multiple agencies to find relevant data. Moreover, that data is often compiled in widely varying formats or it uses different naming conventions, languages, and timeframes.
To help make this task less burdensome, Microsoft External Research is developing SciScope (www.sciscope.org), a Web-based search engine that can be used to locate meteorological, hydrological, and water and soil quality data from numerous data repositories—and then retrieve it in a consistent format. SciScope users will be able to access data on a broad range of environmental measurements—from precipitation, snowpack stream flow to solar radiation, water quality, and biodiversity.
A sample view of a SciScope search. Using a variety of search features, users can query data from multiple repositories, then retrieve the information in a consistent format.
“If you’re a scientist, before you can even begin working with your data, you have to do all the work of making it compatible,” says Bora Beran, an environmental scientist who is overseeing the SciScope project at Microsoft External Research. “SciScope can dramatically reduce the time and effort necessary to discover and assemble a dataset.”
A Single Interactive Interface for Scientific Databases
SciScope melds two existing Microsoft technologies—Virtual Earth, a geospatial mapping platform, and SQL Server 2008, a database management program—into a single interactive interface.
Still in its beta version, SciScope already provides access to data from about 1.7 million sensors across the United States—in all, more than 358 million observational results. In essence, SciScope is a unified portal to the databases of the U.S. Geological Survey (USGS), Environmental Protection Agency (EPA), and National Climatic Data Center (NCDC). Data from smaller regional agencies and individual researchers is also being added.
SciScope enhances Virtual Earth’s imagery with additional map layers that enable users to view features such as aquifers, watersheds, and geology. SciScope users typically begin a data search by indentifying a geographical area—either by selecting a particular Bora Beran, Ph.D., Microsoft Researchgeographical feature or by using a drawing tool to define the area—and setting a timeframe. They can then enter keywords for the specific type of data they are seeking. Once SciScope identifies relevant data sources, users can further refine their search and then download data directly using the interface. The data is provided in a Microsoft Office Excel format that includes contact information for the original sources.
Beran, who earned a Ph.D. in hydroinformatics before joining Microsoft External Research last year, believes SciScope will become an important tool for scientists and engineers.
Brian Gallagher, who runs a private environmental science company in Los Angeles and has been working in the field for nearly four decades, says he sees “enormous potential” for SciScope. ”We can’t really solve environmental problems or do natural resource development properly without good data—and, frankly, it’s hard to find,” he says.
“As scientists, we probably spend 75 percent of our time looking for data. SciScope will help us find data faster, or at least let us know when the data just isn’t there.”
Making Data Widely Available
To illustrate how SciScope could make the data search process more efficient, Beran paints a hypothetical scenario of a scientist studying eutrophication—the over-enrichment of water bodies—in North Carolina’s Neuse River Basin.
Without SciScope, Beran says, the scientist would have to locate potential data sources and then identify all of the relevant observation locations for the Neuse River area—a painstaking process in itself. He would then have to separately search for data for the various indicators, such as nitrogen, phosphorus, turbidity and algae concentrations. Moreover, since the data would likely come from multiple agencies, the scientist would have to reconcile differences in formats and data language before beginning his analysis.
With SciScope, the scientist can simply select the Neuse River Basin by clicking on that area of the map, type “eutrophication” in the search window, set a timeframe and then start the search. SciScope translates this into a query for 59 parameters from four separate data repositories. Once the results are displayed on the map, the scientist can select which data sites he wants to add to SciScope’s Data Cart for retrieval.
Expanding the Potential Applications of SciScope
SciScope, which Beran has been demonstrating at science meetings around the United States, remains a work in progress. “We will be adding new features all the time, so SciScope will probably remain in beta version for quite some time,” he says. An upcoming version will enable users to sign up to receive new data at specified intervals or automatic notifications when new relevant sensors come online.
SciScope could eventually have applications beyond science, Beran says. For instance, river kayakers could use it to do real-time checks of water levels in a particular river. Alternatively, someone looking to purchase property could use SciScope to find all sorts of information about the local environment.
Beran believes that, if successfully applied, Web 2.0 principles—collaboration and interactivity using the Internet as a platform—could enable SciScope to bring together unprecedented amounts of observational data. “There’s a lot of data out there collected by individual researchers, small groups, and universities that often is not available on the Internet,” says Beran. SciScope can serve as a publishing platform for such data.
“Besides making data more widely available, data sharing through SciScope has the potential to lead to all sorts of new research collaborations,” Beran says.
A Microsoft Research Connections-funded project supporting advanced technology research
- Bora Beran, Ph.D., Microsoft Research