Appendix 6: Detailed Hardware Design

This appendix provides detailed hardware designs for the superDAAC and peerDAAC architectures presented in Section 4. Section A6.1 lists the attributes of the hardware platforms available in 1994. Section A6.2 presents 3 architecture designs that satisfy the requirements listed in Section 4. Section A6.3 provides cost estimates for 6 implementations of the designs. Section A6.4 provides architecture designs and cost estimates for representative peerDAACs.

A6.1 SuperDAAC Hardware Components

Tables A6-1 and A6-2 list the variety of tape robots and tape stackers available in 1994 to support data archival storage systems. They include helical scan technology (DD-2 and DD-3) and high-density linear tape systems (DLT and NTP). Other current technology includes DD-1 helical scan tape that supports very high access rates up to 32 MB/s, and optical tape that supports very high capacities up to 1 TB per optical tape. One problem with optical tape systems is their slow read rates of less than 1 MB/s. All costs listed in the tables are based on verbal information and do not represent vendor quotes.

Table A6-1: Tape Technology Attributes
Type
Example COTS
GB/tape
Tapes / cabinet
TB / cabinet
Drives per cabinet
Tape access time (sec)
Tape rewind time (sec)
DD2 - robot
Ampex 8xx
80
264
20.6
4
7
DD2 - robot
Ampex 3xx
330
8
2.6
1
DLT - robot
IBM NTP 3495
20
14000
273.5
16
18-36
90
DLT - stacker
DEC T2877
20
7
0.14
1
30
DLT - robot
DEC TL820
20
264
5.2
3
45
30
DD3 - robot
Storage Tek
20
5120
100.0
16
22
50
Table A6-2: Tape Technology Performance/cost
Type
Hours to read 1 tape
Days to read a cabinet using maximum number of drives
Sustained MB/s per drive
Sustained MB/s per channel
Cost ($K) per tape media
Cost ($K) per drive
Cost ($K) per cabinet
DD2 - robot
1.5
4.1
15
20
$0.12
$110
$330
DD2 - robot
6.3
2.0
15
20
$0.12
$120
$50
DLT - robot
0.4
13.5
15
20
$0.02
$60
$800
DLT - stacker
2.3
0.7
2.5
10
$0.03
$6
$9
DLT - robot
2.3
8.4
2.5
10
$0.03
$6
$85
DD3 - robot
0.4
4.8
15
20
$0.14
$150
$285

It is interesting to note that tape robots using the maximum number of allowed tape drives are able to read all the data within the robot in intervals ranging from 0.7 days to 24.5 days. This implies that 1 PB of data could be read in the same time interval if the multiple robots used to store that data were all accessed in parallel. This minimum data access period determines how suitable a given tape robot is for supporting data streaming and defines the expected access performance levels for building a streaming data system. The most cost-effective system tends to be the one that reads all the data in the desired time period. Systems that read the data at a faster rate waste resources on I/O capacity. Systems that read the data at a slower rate waste resources on storage capacity.

Table A6-3 lists the attributes and cost for the DBMS platforms. The database server supports the tape robots, the disk cache and tape logging systems, and enough I/O bandwidth to sustain the peak access rates. The 3 types of database platforms differ chiefly in the internal I/O bandwidth. The external bandwidth that can be supported is limited by the number of storage devices that are daisy-chained on a single I/O channel. The data-processing requirements are met by increasing the number of database platforms to satisfy either the external or internal I/O bandwidth requirements that are derived from the user load.

The preferred CRAY Superserver platform would have 32 CPUs and 32 GB of memory. The listed system has 64 CPUs and 16 GB of memory but costs approximately the same. If possible, even larger memories are desired, on the order of 200 GB, to store the entire active metadata index in memory.

The scalable MPP minimum configuration is assumed to be half of one cabinet holding 32 CPUs.

Table A6-3: DBMS Server Technology
Hardware class
Example COTS
Number of processors
Sustained Mflops / processor
GB RAM / processor
Number of busses
GB/s per bus
IO:mem GB/s
$M / base system
Commodity MP
IBM RS/6000
1
143
2
2
0.08
0.16
$0.370
High-performance MP
CRAY Superserver
64
20
0.25
4
1.70
6.80
$7.000
Scalable MPP
Representative parallel system
32
10
0.25
16
0.17
2.73
$1.600

As pointed out earlier, a disk cache is needed to support interactive queries sufficient to hold either one weekís worth of derived products, or 1% of the archived data. Both of these estimates point to an 18-TB cache. In addition, disk is needed to support all of the metadata associated with the archived data. The metadata size is assumed to be 0.1% of the archived data, or about 2 TB. Three copies of the metadata are kept, one for supporting the database platform that responds to interactive queries (the DBMS index server or CPU 1 in the example architecture diagrams), one for the request queueing system or CPU 5 in the example architecture diagrams), and a third copy that is distributed across the DBMS platforms (CPU 4 in the architecture diagrams). The total disk requirement is 24 TB. Disk systems of this size are available, and 2 versions are listed in Table A6-4.

Table A6-4: Terabyte Disk Technology
Cray network disk array
$1.17M/TB
Commercial RAID (e.g., Auspex)
$1.76M/TB

The cost of the disk cache is a significant cost component of the current system. We have chosen to use the Cray network disk array in all the cost estimates.

The last major component of the superDAAC is the compute server. We considered 3 technologies, including workstation clusters, scalable MPP systems, and vector supercomputers. Table A6-5 lists their assumed attributes. The Gflops rate is taken from the 100 X 100 LINPACK performance numbers published by Jack Dongarra. (The data is available via electronic mail from netlib@ornl.gov.) The computation rate for a single CPU was multiplied by the number of processors. We noted that many of the operations required to generate standard products could be done using integer arithmetic or byte manipulation. To provide a conservative estimate of the number of compute servers needed to support generating the standard products, we assume the rate at which a platform sustains integer operations (Giga Int Ops) is comparable to the rate at which the platform sustains floating-point arithmetic (Gflops).

Table A6-5: Compute Platform Technology
Sustained performance
Type
Number of processors
Gflops
Giga Int Ops
I/O:mem (GB/s)
$M / computer
DEC 7000
4
0.18
0.27
0.30
$0.55
IBM RS /6000
1
0.14
0.15
0.16
$0.35
Supercomputer
16
6.20
6.20
32.00
$32.00
Scalable MPP
32
0.32
0.35
2.73
$1.44

The sustained performance is used to calculate the number of such platforms needed to create the standard products for the targeted user load. The column Giga Int Ops refers to data-processing steps that can be done by integer- or byte-oriented arithmetic. The column i/o:mem (gb/s) refers to the i/o bandwidth in gb/s from memory to external i/o devices that is sustainable within the computer platform. The mpp compute platform is assumed to have 1 processor per node.

A6.2 Three SuperDAAC Architectures

We present 3 designs for the superDAAC that make use of different high-speed interconnects to link the compute and data server platforms. Technology that exists today includes high-speed backbones such as HIPPI and ATM, and high-speed backplanes that directly link workstation CPUs in a mesh topology. The HIPPI backbones are used in production at the national supercomputer centers. The ATM backbone is rapidly becoming available, although current commercially available switch technology is limited to OC-3 speeds. The high-speed backplane technology provides the fastest interconnect for linking CPUs and exists as a separate commercially available product. The interconnect that will be used 3-6 years from now to link multiple compute platforms will probably be one of these technologies.

An example superDAAC architecture based on the SPARC superserver and the CRAY C90 vector supercomputer linked by a HIPPI backbone is shown in Figure A6-1. The major components are a data multiplexor to minimize the number of times the input data is stored and read, a metadata interactive response engine (metadata index server) that stores the active index in memory, C90 compute engines to generate the higher-level data, a request queueing system that backs up the metadata engine, tape storage robots to archive the raw data, and a database management system that stores the standard products and supports reading of the data archive. The amount of data that can be stored is scalable by adding or deleting tape robots and compute servers.

Figure A6-1: SuperDAAC Hierarchical Hardware Architecture

The data multiplexing system (labeled DB Insert (CPU 3) in Figure A6-1) forks copies of incoming level 0 data streams to the storage archive, the Request Queueing System, and a data-processing system that generates higher-level data from level 0 data. The data multiplexing system is required to continually ingest level 0 data at 3 MB/s. A temporary data cache that can hold up to 16-hours of level 0 data is used to save data that cannot be held in the multiplexorís memory. The specific hardware architecture shown in Figure A6-1 uses an 8-processor, 8-GB RAM CRAY CS-6400 SPARC Superserver system as the level 0 multiplexing server. Additionally, during disaster recovery situations, the multiplexing system is required to handle level 0 data that is pulled out of the archive at the rate of 12 MB/s. This rate is fast enough to allow 1 TB of data to be restored each day.

The DBMS runs either as a distributed system or as hierarchical parent/child servers. The DBMS Index Server (CPU 1) provides support for interactive queries. This system has sufficient disk to store all the metadata and is intended to have enough memory to cache all the active metadata indices. A well-balanced system would have on the order of 200 GB of memory for this purpose. The 64-processor, 16-GB RAM CRAY CS-6400 SPARC Superserver that is used in the design is a place-holder for a 32-processor, 32-GB system that is expected to be available in early 1995. Both systems have equivalent costs. The CS-6400 supports hot swapping of CPUs for higher reliability.

The Request Queueing System (CPU 5) is the interface between the superDAAC and the external world. It acts as a security firewall, supports the communication links to the second superDAAC and to the peerDAACs, supports a backup copy of the metadata, and has the capacity to be the CPU Index Server in case CPU 1 goes down. It supports queueing of requests that require reading the entire archive. The platform also has sufficient capacity to support Hierarchical Storage Management software for controlling migration of data objects between tape and disk. As before, the preferred hardware platform would be a 32-processor, 32-GB system.

The DBMS servers (CPU 4) control both the tape robots and the Hot Cache for storing active data objects. Each of the DBMS servers runs the distributed DBMS. The data subsets that are needed to answer a DBMS query are read from the tape robots (accessed at 15 MS/s) and sent to the compute servers for processing. If the standard product already exists on the Hot Cache, it can be sent directly to the Request Queueing System for transmission across the network. Sufficient internal bus bandwidth is needed to allow all tape drives to be read simultaneously. The CS-6400 SPARC Superserver has sufficient capacity to do this. The design uses the IBM NTP 3495 tape robot, storing up to 273 TB of data. Each robot can support up to 16 tape drives. As before, the preferred hardware platform for the DBMS server would be a 32-processor, 32-GB system.

The compute servers (labeled CPU 2 in Figure A6-1) are CRAY C90 vector supercomputers. They act as servers for explicit execution of external functions at the request of the database engine. Data is moved from the DBMS servers (data-subsetting platforms) over the HIPPI LAN for processing, then to the Request Queueing System for transmission over the network.

The second superDAAC architecture (shown in Figure A6-2) is based on workstation technology and a combination of HIPPI and ATM LANs. The same architecture design is used to support the incoming data stream, the Request Queueing System, and the DBMS Index Server for interactive response to user queries. The C90 compute servers and the CS-6400 DBMS servers are replaced with commodity workstations linked by an ATM/HIPPI network. It may be possible to replace the HIPPI switch with a 128-port ATM switch running at OC-3, provided the maximum internal LAN bandwidth does not require aggregating data sets from multiple tape drives. Each tape drive is capable of reading data at 15 MB/s. Thus, the output from 1 tape drive could saturate an OC-3 link.

This design assumes that there will be data manipulation operations that require generating aggregate data sets at the maximum design rate of 63 MB/s. OC-3 links are assumed for links that do not require more than 15 MB/s. All I/O channels supporting aggregation across multiple OC-3 channels are implemented with HIPPI channels at 100 MB/s. This imposes the maximum sustainable data rate for this design. If higher rates are needed, double-wide HIPPI could be used at 200 MB/s. Alternatively, when OC-24 or OC-48 switch technology is available, the HIPPI channels could be replaced with ATM. The HIPPI/OC-3 concentrator technology is available from Netstar.

Approximately 124 workstations are required to support the tape drives, the 24 TB of disk cache, and computation of the standard products. The IBM NTP tape robot technology is used, requiring only 8 tape silos to store the 2 PB of data. Each silo has 9 tape drives that are connected through separate controllers to different workstations across fast-wide SCSI links. A distributed database runs across all of the workstations. If DEC 7000 platforms are used, a total of 124 workstations are needed, with nominally 106 of them serving as compute servers and 18 as tape and disk controllers. In practice, the compute and I/O requirements are spread as uniformly as possible across all of the workstations.

The viability of the workstation architecture versus the vector supercomputer architecture is driven by price/performance. The current disadvantages are the larger number of systems that are needed and the lower-speed LAN connecting the workstations.

Figure A6-2: SuperDAAC Large-cluster Hardware Architecture

Our third superDAAC design (shown in Figure A6-3) is an application of scalable MPP technology. Again, the same architecture design is used to support the incoming data stream, the Request Queueing System, and the DBMS Index Server for interactive response to user queries. The commodity workstations are replaced with a scalable MPP. This effectively replaces the ATM LAN with the higher-performing 2D mesh backplane technology. Current MPP backplanes support hardware I/O rates of 175 MB/s. Actual sustained data movement has been measured at 85-170 MB/s.

A representative MPP is shown based on the current Intel Paragon. The Pentium Paragon uses the commodity Pentium CPU chip. By 1995, MPPs will also support the PCI peripheral device interconnect and will support fast-wide SCSI access.

The same tape technology and disk technology are used as for the workstation cluster. The tape silos and disk arrays are attached directly to a subset of the MPP nodes, with the remainder of the nodes used for computation. The data flow through the Intel MPP is dominantly left to right. For this particular design, there are 1,632 compute nodes, 72 nodes supporting tape drives, and 32 nodes supporting disk drives.

The desirability of the MPP architecture versus the workstation architecture is dependent on price/performance. The advantages are a higher-speed backplane, support for the same distributed programming environment, and more compact packaging. Fault-tolerant parallel database systems exist. An example is Teradata.

A6.3 SuperDAAC Cost Analysis

A cost analysis helps determine which of the hardware components provides the best price/performance and demonstrates the tradeoffs that could be made in the system design. We have used a spread sheet to explore the range of options. The spread sheet accounts for technology cost as a function of load requirements in the areas of database servers, storage devices, and compute platforms. A simple system simulation was created that modeled the number of hardware platforms needed to support I/O, storage, and compute load requirements as a function of the hardware platform capabilities. Figures A6-1, -2, and -3 were based on the results of these simulations. Platform cost estimates were used to predict the cost of the resulting system in 1994 dollars.

Figure A6-3: SuperDAAC MPP Hardware Architecture

The simulation was driven by the user load variables defined in Table 4-1. These were combined with the architecture lifetime to produce the I/O and computer load listed in the same table. In order to produce a conservative design, the hardware requirements needed to support the storage capacity, the peak I/O bandwidth, and the sustained computation rate, were calculated separately. This implies that there is additional I/O capacity available from the compute platforms, and additional compute capacity available from the I/O platforms. The actual hardware platform characteristics are listed in Tables A6-1, -2, -3, -4, and -5.

The components used in the simulation are divided into tape storage devices, database interactive response platforms, storage server platforms that support direct access to the tape robots, networking platforms that serve as input and output distribution systems, network switch technology for connecting the systems together, a request queuing system for supporting global queries, and compute platforms.

A major result of the simulation is that, even with current prices, the cost of a data-processing system capable of storing all data on nearline tape robots and accessing all of the data within 1 month is feasible. A system consisting of NTP storage technology, scalable MPP database management servers, and compute servers could support 2 PB of data for a cost of $99.6M, with another 24 TB of disk for $31.5M.

A6.3.1 SuperDAAC Cost Tradeoffs

The number of platforms needed to support the load were derived by taking the appropriate hardware characteristic and dividing it into the corresponding load requirement. While this was done, appropriate checks were made that the hardware platform internal bus limits were not exceeded. In practice, the combination of hardware platform limits led to a conservative design. For example, at 15 MB/s tape read rates, only 3 tape drives are needed to continually store all the standard products. Each tape robot has at least 1 tape drive, and the actual number of tape drives is determined by the capacity of the tape robot divided into the total storage required. To be able to read and write at the same time, 2 tape drives per silo are used when possible. Thus a nearline archival storage system has a read rate that is substantially higher than that needed to store the data, provided the LAN can support the aggregate storage device bandwidth.

Two optimizations of the superDAAC addressed tape-drive issues. One sought the minimum price such that the time needed to read the entire archive was about 1 month, and one determined the effect of putting the maximum possible number of drives in each tape robot. Table A6-6 gives the hardware components used in the optimizations.

Table A6-6: SuperDAAC Configurations
Type of platform
Storage device
Database platform
Compute server
Interconnect technology
Request queueing system
WS/Ampex
Ampex 8XX
RS/6000
RS/6000
ATM
CS-6400
WS/STK
Storage Tek
RS/6000
RS/6000
ATM
CS-6400
WS/DLT
DEC TL820
RS/6000
RS/6000
ATM
CS-6400
WS/NTP
IBM NTP 3495
DEC 7000
DEC 7000
ATM
CS-6400
Vector/NTP
IBM NTP 3495
CS-6400
Supercomputer
HIPPI
CS-6400
MPP/NTP
IBM NTP 3495
Scalable MPP
Scalable MPP
Backplane
CS-6400

The configurations chosen for evaluation included 4 based on workstation technology (WS) to compare storage devices, and 3 based on a particular tape storage device to compare interconnect technology and associated compute platforms. For each configuration, the number of hardware components and associated costs were calculated. An attempt was made to keep the aggregate performance of the configurations as similar as possible. The time to read the archive was kept close to 30 days. The amount of memory per platform was adjusted such that the total amount of memory was approximately 260-280 GB. The latter was not possible for the vector supercomputer system, which had only 130 GB combined of memory and fast Solid-state Storage Device RAM.

The results of the first optimization for a 30-day archive read period, is given in Table A6-7. The column labeled data server cost ($M) includes the costs for the database servers, the metadata active index memory cache, the storage servers, the network switches, and the batch queueing server. The archive tape cost includes the tape robots, tape drives, and data tapes, and varies from $12M to $58M. Not surprisingly, the lowest-cost system is the one that stores the most data in a single robot. The maximum number of days needed to read all the archived data is calculated by assuming 1 tape drive is reserved for writing to the tape robot.

Table A6-7: SuperDAAC Hardware Components
Type of platform
Max number of days to read archive
Number of data servers
Number of compute servers
Number of tape robots
Data server cost ($M)
Disk cache cost ($M)
Archive tape cost ($M)
Compute platform cost ($M)
Total hardware cost ($M)
WS/Ampex
33.4
22
136
100
$28.6
$31.5
$58.0
$47.6
$165.7
WS/STK
27.0
21
136
21
$28.1
$31.5
$33.3
$47.6
$140.5
WS/DLT
37.5
26
136
398
$30.4
$31.5
$44.1
$47.6
$153.6
WS/NTP
27.7
18
106
8
$29.5
$31.5
$12.8
$55.7
$129.5
Vector/NTP
27.7
2
4
8
$31.0
$31.5
$12.8
$128.0
$203.3
MPP/NTP
27.7
4
60
8
$21.3
$31.5
$12.8
$65.4
$131.1

The capability of each design was limited by the functionality of the individual hardware components. For instance, the time needed to read all the data in the archive varies from 27 days to 37 days around the target of 30 days. This is governed by the number of tape storage devices used per silo versus the silo capacity. The system is storage capacity-limited rather than I/O access-limited. The number of tape silos is found from the total storage requirement of 2 PB divided by the tape silo capacity. The bandwidth needed to read the archive each month is 732 MB/s, while the aggregate read capability across the tape robots varies from 1 to 3 GB/s. Not surprisingly, the 2 archival storage systems that cost the most have the higher 3 GB/s total write capability. The total write rate could have been held fixed across each of the tape robot architectures if the archive read period had been allowed to vary. The cheaper archive systems are the ones that are better balanced between I/O and storage capacity. The varying costs for the archival storage devices also depend on the price/performance of the associated tape technology. The NTP 3495 system marketed by IBM achieves a low price by storing 14,000 tapes in a tape silo.

The varying costs for the database server platforms are a function of the internal I/O bus and the maximum number of peripheral devices that can be supported per platform. The varying compute platform costs arise from price/performance variations. Note that it is possible for a particular architecture to be less cost-effective for computation but more cost-effective for supporting external I/O and, thus, still achieve a reasonable total cost. The MPP platform is a factor of 1.5 more expensive for computation, but a factor of 1.4 less expensive for supporting the tape robots.

The design is conservative for both the I/O storage rate and the computation rate. The computation rates for a given platform were taken from the LINPACK 100 X 100 benchmark. This is a relatively small problem that does not allow efficient use of vector registers. It is expected that the large data files being stored for EOSDIS will allow effectively better use of the arithmetic units and up to a factor of 2 higher computation rate. The computation rate needed to generate the standard products is a combination of integer, byte manipulation, and floating-point operations. The required execution rate level is taken to be the sum of these operations. A less conservative approach will predict a smaller total execution rate by taking the maximum of these computation rates. As discussed in Section 5, 2 superDAACs and 150 peerDAACs are being proposed, with a total compute capacity that is a factor of 3.5 times that of a single superDAAC. Given that the compute capacity of a single superDAAC is between 2-4 times the rate that is needed, we believe the total design capacity is 7-14 times that needed to handle the push processing. Explicit benchmarks of the standard product generation codes are needed on the chosen compute platforms to quantify the correct number to buy.

The number of data servers, tape robots, and compute servers that can be maintained is a strong function of their intrinsic failure rate. Assuming that each device can run for 2 years without failure, a reasonable load on the support staff is to have no more than 1 failure per working day. This implies that the total number of hardware systems should be fewer than 400. From the user perspective, more than 1 failure per week is very noticeable, implying the number of systems should be fewer than 60. Only the vector supercomputer platform and the scalable MPP system achieve this. Their storage architectures used tape robots that individually stored 270 TB of data, and higher-performance storage and compute servers. The workstation-based systems needed 106-136 compute servers to handle the compute requirements and 18-26 data servers to support the storage devices and disk cache. The cost of the workstation-based systems was slightly cheaper than that of the scalable MPP system and 40% cheaper than that of the vector supercomputer system. By 1997, it is expected that workstation performance will have increased sufficiently that it will be feasible to use workstations as the data and compute servers.

The second optimization demonstrates what happens when the number of I/O channels is increased such that every tape robot can be read at the maximum available bandwidth. This analysis was produced by decreasing the number of storage devices per I/O channel to 1 and using the maximum permissible number of tape drives in each robot. Table A6-8 gives the result for the optimization.

The shortest time to read the entire archive is now about 2 weeks or less for all the storage devices, except for the DEC TL820. The TL820 was already reading the archived data at its maximum permissible rate. The cost increment varies from 5% for a factor of 2 improvement in read access rate, to 21% for a factor of 6 improvement. The total read access rate is about 6 GB/s for the Ampex system costing $82M, about 3 GB/s for the STK and DLT systems costing about $50M, and 2 GB/s for the NTP system costing $16M.

Table A6-8: Highest Retrieval Rate SuperDAACs
Type of platform
Max number of days to read archive
Num-ber of data servers
Num-ber of com-pute servers
Number of tape robots
Data servers cost ($M)
Disk cache cost ($M)
Archive tape cost ($M)
Compute server cost ($M)
Total hardware cost ($M)
WS/Ampex
5.6
52
136
100
$40.5
$31.5
$80.0
$47.6
$199.7
WS/STK
9.0
33
136
21
$32.9
$31.5
$52.2
$47.6
$164.2
WS/DLT
37.5
26
136
398
$30.4
$31.5
$44.1
$47.6
$153.6
WS/NTP
14.7
23
106
8
$32.4
$31.5
$16.2
$55.7
$135.8
Vector/NTP
14.7
3
4
8
$38.0
$31.5
$16.2
$128.0
$213.7
MPP/NTP
14.7
5
60
8
$22.5
$31.5
$16.2
$65.4
$135.6

A6.4 PeerDAACs

PeerDAACs provide administrative decision points for local control of data archives. An advantage of peerDAACs is that they can provide access to the set of data of most interest to a particular research group and allow a particular research group to govern their own data sets. The disadvantage is that they require maintenance and system support to guarantee data reliability.

A modified version of the design criteria used for the superDAACs can be applied to the design of peerDAACs. The same hardware systems are used: data storage devices, database server platforms, database interactive response servers, network switches, data input/output platforms, and compute servers. However, the user load requirements are different.

We assume that a peerDAAC will archive 1/200 of the raw data that is also being archived at the superDAACs. Aggregated across 200 peerDAACs, this implies that all the raw data archived at the superDAACs is also archived at the peerDAACs. The users of the peerDAACs will do lazy evaluation of the raw data sets. Global queries can be processed at the superDAACs or through a distributed database system that links all of the peerDAACs. All data products generated at the peerDAACs are stored at the data creation site.

A6.4.1 PeerDAAC Simulation

A similar simulation was used to analyze multiple system architectures for the peerDAACS. The load parameters for the peerDAACs are similar to those for the superDAAC but include 2 significant differences. No eager processing of the raw data stream was assumed. Instead the peerDAAC was designed to do lazy evaluation of user requests at a predetermined average rate. This was quantified by setting it equal to the rate at which the peerDAAC received raw data. This effectively means the peerDAAC is designed to have enough compute and storage capacity to lazily evaluate data at the same rate that would have been needed to do eager evaluation of the incoming raw data stream. Two peerDAAC optimizations were done, one for a minimal peerDAAC that is capable of processing and storing 1/200 of the superDAAC raw input stream rate, and one for a large peerDAAC capable of processing and storing 1/50 of the superDAAC raw input stream rate. More powerful peerDAACs can process raw data at some multiple of the raw data rate and will have to store proportionally more data.

6.4.2 Minimal PeerDAAC

The results of the first optimization are presented in Table A6-9 for a peerDAAC that stores 7.4 TB of data. In the architecture design, each data storage device is connected by an I/O channel that is able to sustain the full tape-transfer rate to guarantee the best possible data access rate. The peerDAAC architectures are built from workstation technology and use either the DEC TL820 tape robot or the DEC T2877 tape stacker. The tape robot attributes are given in Tables A6-1 and A6-2. The workstation attributes are given in Table A6-5.

Table A6-9: Minimal PeerDAAC
Type of platform
Max number of days to read archive
Num-ber of data server
Num-ber of com-pute servers
Number of tape robots
Data servers cost
Disk cache cost
Archive tape cost
Compute platform cost
Hardware cost
IBM

RS/6000/ TL820
12.5
2
1
2
$1.35
$0.11
$0.22
$0.37
$2.04
DEC 7000/DLT
0.7
3
1
56
$2.01
$0.11
$0.85
$0.55
$3.52

The system design evaluation shows that the minimum period for reading the entire archive is between 0.7 and 12.5 days for a cost between $3.52M and $2M. The system cost is dominated by the data server cost, which includes a DBMS Index Server designed to provide interactive response to database queries. Fast interactive response is accomplished by providing enough memory to hold the active metadata index entirely within the workstation memory. Two workstations are used to support the tape robots and a disk cache that holds 1% of the total archive, or 74 GB. An I/O platform is included to act as a security firewall for the system and support external communications.

Figure A6-4 shows the peerDAAC implementation. ATM networking technology is used to connect the security firewall workstation (CPU 3) to the DBMS Index Server workstation (CPU 1) that provides support for interactive queries of the database. Two DBMS server workstations (CPU 4) support the 2 tape robots, and 1 workstation (CPU 2) acts as the compute server and supports the active disk cache. As more data is added to the system, more tape robots can be added along with controlling workstations or larger-capacity tape robots can be used. As the storage capacity increases, note that the number of compute workstations must also be increased to maintain a balanced system.

Figure A6-4: PeerDAAC Architecture

6.4.3 Large PeerDAAC

A larger peerDAAC can be designed that archives a larger fraction of the raw input data stream. A peerDAAC that stores 2% of the raw input rate will store 30 TB after 3 years. Table A6-10 gives the resulting system and cost.

Table A6-10: Large PeerDAAC Costs
Type of platform
Max number of days to read archive
Number of data server servers
Number of compute servers
Number of tape robots
Data servers cost
Disk cache cost
Archive tape cost
Compute platform cost
Total hardware cost
IBM RS/6000/ TL820
12.5
2
2
6
$1.03
$0.43
$0.66
$0.74
$2.86
DEC 7000/NTP
3.0
2
2
1
$1.38
$0.43
$1.31
$1.10
$4.22

Increasing the rate at which the peerDAAC can process data by a factor of 4 increased the cost by 20-40%. Note that the previous $2M architecture cost has increased to only $2.9M, with most of the additional cost attributed to an additional compute platform and 4 additional tape robots. The TL820 archive read time was calculated as before by assuming that 1 tape drive is being reserved to write data into the silo. The achievable read time can only be lowered by going to an alternate technology. An example is given in the second case for which the IBM NTP tape silo is used. Normally, the time to read an NTP silo is a minimum of 14 days. However, the total amount of data stored in the silo is only 30 TB compared to the silo capacity of 270 TB. Thus, the time to read the partially full silo is significantly smaller.