Grid-based data mining with Environmental Scenario Search Engine

  • Mikhail Zhizhin ,
  • Alexey Poyda ,
  • Dmitry Mishin ,
  • Dmitry Medvedev ,
  • Eric Kihn ,

in Data mining techniques in grid computing environments

Published by Wiley | 2009 | Data mining techniques in grid computing environments edition

The increasing data volumes from today’s collection systems and the need of the scientific community to include an integrated and authoritative representation of the natural environment in their analysis requires a new approach to data mining, management and access. The natural environment includes elements from multiple domains such as space, terrestrial weather, oceans and terrain. Systems such as the Global Change Master Directory (GCMD) from NASA or the Master Environmental Library (MEL) from the DMSO and others provide the ability to search metadata by keywords, the result being a set of links to archived environmental data sets distributed across the network, but they are unable to search for specific patterns within the data themselves. The environmental modelling community has begun to develop several archives of continuous environmental representations. These archives contain a complete view of the Earth system parameters on a regular grid for a considerable period of time. The numerical models used to reproduce environmental parameters take all available observational data as initial conditions, so the resulting petabyte-size data sets may be considered as an authoritative high-resolution representation of terrestrial weather during the last 50 years (Kalnay et al., 1996; Uppala et al., 2005). This chapter describes the Environmental Scenario Search Engine (ESSE) for data grids, which provides uniform access to heterogeneous distributed environmental data archives and allows the use of human linguistic terms while querying the data. A set of related software tools leverages the ESSE capabilities to integrate and explore environmental data in a new and seamless way. The ESSE software is available for download from the Internet.