Speaker Beth Plale
Affiliation Indiana University
Host Dan Fay
Date recorded 7 October 2005
Doppler radar data, which has proven its value in meteorology research, has tremendous potential for use in many other research endeavors if only it weren’t so difficult to work with. In DopplerSource we are removing the hurdles that prevent broader use of the data through a service-based framework for storing, operating on, and serving the data. The 130 WSR-88D (Doppler) radars located throughout the United States generate Level II data continuously 24x7. The data has been valuable in many aspects of meteorology research and education, for instance, for the real time warning of hazardous spring and winter weather, for initializing numerical weather prediction models, and for verifying the occurrence of past events, such as the location of damaging hail. But it has broader potential. Level II data is used in bird and insect migration student, bird strike avoidance, urban pollution transport, and the tracking of hazardous atmospheric releases. This larger goal of facilitating additional avenues of science cannot be fully realized without significant improvements in the accessibility and availability of the data over what exists today.
In this project partially funded through Microsoft e-Science, we are constructing a .NET framework for storing, operating on, and serving NEXRAD Level II data and the knowledge products derived from the data. Our pilot project is aimed the six nearest radars surrounding Bloomington, Indiana. The project focus areas are in:
Storing and indexing large volumes of streaming data using a SQL Server database Generating metadata on-the-fly to describe data and capture features of time-sequence in which the data arrived Simple retrieval of Doppler data through a spatial-temporal interface. The user selects a region of interest, and specifies a temporal range. Support services to query, process, clean, filter, and fuse data on the fly Authentication mechanisms to avoid denial of service abuse by over-taxing the computational resource Scalability—level of performance that balances continuous input stream arrival, computationally intense user services, and rich query access over highly correlated temporal and spatial data Log analysis to characterize arrival and anticipate user workload. Logs from related meteorology services used to analyze patterns of use that allow us to better anticipate future usage patterns The storage needs for the pilot radars alone is substantial. The 6 radars generate 27.5 TB per year of raw Level II data that can be compressed to 1/25th size, requiring 1TB/yr of storage. A useful transformation of the data is into the binary netCDF format. The converted data adds another 2.5 TB/year. The arriving data products are tagged with metadata to facilitate searching. The metadata needs for the pilot data products are estimated at 170GB/yr. The knowledge products generated on-demand by statistical analysis and data mining services are estimated at 0.5 TB/yr. This places the total storage need at 4.5TB/year of data. The tools used include web service framework (.NET), database management system (SQL Server), XML metadata schema (leveraging LEAD Metadata Schema from the NSF LEAD project), and Integrated Radar Data Services (IRaDS) support for the Doppler streams. The hardware testbed includes 16 dual Opterons with 16GB RAM each, a 3.5 TB SAN storage array, a dual Opteron, 4GB RAM, 2TB RAID 1 disk, Windows 2003 as the database server, and the Indiana University MDSS fault tolerant mass store server with a collective 1 Petabyte of storage.
©2005 Microsoft Corporation. All rights reserved.