Executive Summary

This report summarizes the study of the EOSDIS architecture performed by the Project Sequoia 2000 team. Herein, we focus on a system to collect, store, process, and distribute raw feed data from EDOS. We also include the costs of obtaining and managing the appropriate software and hardware. Furthermore, we include the cost of operating the complex for 10 years and providing user support. In this part of the report, we summarize our major conclusions.

EOSDIS should focus on ad hoc queries, not standard products.

Traditional NASA systems have focused on defining a collection of standard products that Earth Science users will order for electronic or manual delivery. In our view, EOSDIS is a vast information repository, and users will move to a model of submitting requests for the information they desire. This leads to a preponderance of so-called ad hoc queries to which EOSDIS systems must respond.

If a sufficient number of users request common information, it will be worthwhile to precompute such data and package them in standard products. However, this is strictly an optimization technique to save computing resources and speed response time.

Our architecture is novel in that NASA can adjust the number of standard products generated dynamically to meet current needs. As such, no costly debate is required concerning exactly which standard products to generate.

In this report, we present several user studies that indicate that 80-90% of EOSDIS requests will be for ad hoc queries and only 10-20% will request standard products.

EOSDIS should have a different model for DAACs and SCFs.

To process queries, EOSDIS assumes a collection of 8 DAACs will store raw data and compute a collection of standard products. In addition, a collection of SCFs perform further processing.

In contrast, we recommend that EOSDIS construct exactly 2 "superDAACs," which will each store the entire collection of raw data. In addition, these 2 superDAACs will precompute the results to commonly requested queries and perform some ad hoc queries.

In addition, we recommend that EOSDIS construct an additional N "peerDAACs," which perform ad hoc queries and store data sets commonly requested by a particular subset of the Earth Science community. PeerDAACs are much smaller and cheaper than superDAACs. They have less stringent reliability requirements than superDAACs because they can fail and still regenerate stored data by reconstructing them from raw data stored at the superDAACs.

All systems run the same software, yielding a model with 2+N DAACs. In this report, we indicate hardware for each kind of DAAC that is available in 1994 and specify its cost. In addition, we assume that EOSDIS is targeted for a 2+250 model in our cost estimates.

Our 2 superDAACs are vastly cheaper to operate than the 8 current DAACs. Moreover, peerDAACs can be run with minimal levels of human operation. As such, our model is cheaper to operate than the current NASA model. We estimate that moving to our model would save $88M in operation costs over the time period 1998-2003.

EOSDIS should do just-in-time hardware acquisition.

In this report, we carefully predict technology trends in CPU, disk, tape, and network technology. Since all are getting cheaper at a rapid rate, it is imperative that EOSDIS purchase hardware just in time. We assume that an initial EOSDIS system is procured in late 1997 and is expanded incrementally through the year 2003.

Each hardware procurement is assumed to pay the expected cost of the technology in the year acquired. By doing just-in-time procurement, NASA can obtain suitable processing capability for the 2+N DAACs for about $45M, a savings of $121M relative to current EOSDIS budgets.

EOSDIS should adopt a DBMS-centric viewpoint.

Consistent with a preponderance of ad hoc queries, we recommend that EOSDIS adopt a DBMS-centric viewpoint. To accomplish this, we recommend using an advanced object-relational DBMS as a local storage and processing engine. This engine will run an enhanced dialect of SQL, which we term SQL&endash;*.

In addition, EOSDIS must construct "middleware" to allocate queries and data storage to the 2+N processing sites. In addition, middleware must decide which objects to precompute and must respond efficiently to queries that may require accessing the entire data store.

Moreover, EOSDIS must focus considerable energy on database design and customization of SQL-* to meet Earth Science needs (so-called type extension). In addition, the communication protocol between client sites and middleware and between middleware and local servers should be optimized for SQL. As such, Z39.50 and CORBA are inappropriate technical choices; rather the focus should be on a standard SQL application programming interface (API).

There are four major benefits of this architecture. First, it simplifies the software that must be written, thereby lowering the total software development cost. Second, it allows a different user-support model. These items are discussed in the next two sections. In addition, SQL DBMS technology, in our opinion, is the best vehicle to ensure the evolution and portability of the EOSDIS system. Lastly, SQL DBMS technology, through the use of views and triggers, is best able to support the dynamic optimization of which standard products to precompute.

EOSDIS should focus on COTS software.

In our opinion, previous governmental systems have erred by building large proprietary, hard-to-maintain, one-off systems. In contrast, adopting commercial off-the-shelf (COTS) software allows one to share the development and maintenance costs with many other users.

Therefore, EOSDIS should purchase an SQL-* DBMS from a commercial vendor. By 1997, there will be several reasonable choices for vendors. In addition, EOSDIS should purchase COTS system management tools based on the SNMP protocol.

In some areas, specifically hierarchical storage management (HSM) and SQL-* middleware, current and projected COTS products will not meet EOSDIS needs in a timely fashion. This should not be used as an excuse by the contractor to build a proprietary solution. Rather, EOSDIS should contract with 2 vendors of HSM software to accelerate the development of their products in ways required by EOSDIS. Two vendors allows for multiple architectural efforts and lowers the risk of complete failure. Furthermore, we recommend that EOSDIS contract with 2-3 SQL middleware vendors to accelerate their product timelines to meet EOSDIS needs.

Using this approach, EOSDIS can be built with very little proprietary software. The only major pieces are a type library for geographic objects and some job scheduling middleware. We estimate that the software development, maintenance, and acquisition budget can be decreased to $50M, a savings of $107M relative to the current budget of $157M.

EOSDIS can use a different model for user support.

If all data is in a general-purpose DBMS, then data must conform to the data base schema that is used. This schema is documented in the data dictionary and can be accessed online by users through browsing tools. Moreover, help information and bulletin boards can be designed to make information in the database palatable using Mosaic and the World Wide Web (WWW). The contractor should focus on online help, rather than writing a polished hardcopy user manual, and encourage users to ask questions by e-mail. We view a call to 1-800-HELP-EOS as a last resort.

Using this model, we expect technical support costs to decrease dramatically. Instead of a human-intensive support system, we propose moving to an electronic-intensive support system. The savings for this is included in the operations costs.

In summary, by accepting this collection of suggestions, the cost of EOSDIS could be reduced by $316M. Although we did not look at other parts of EOS (most notably the data-acquisition component), we expect that similar savings could be achieved through careful application of our approach.