The World-Wide Telescope: Building the virtual astronomy observatory of the
future; 4/5/2002
Astronomers are collecting huge quantities of data, and they are
starting to federate this data. They held a Virtual Observatory
conference in Pasadena to discuss the scientific and technical aspects
of building a virtual observatory that would give anyone anywhere access to all
the online astronomy data. My contribution
ppt was a computer science technology forecast.
doc,
pdf, The Virtual Observatory will create a "virtual" telescope on the
sky (with great response time). Information at your fingertips for
astronomers; and for everyone else. A single-node prototype for this is at (http://skyserver.sdss.org/).
More recently, Tanu Malik, Tamas Budavari, Ani Thakar, and Alex Szalay have
built a 3-observatory SkyQuery
(http://SkyQuery.net/) federation using .Net web services (I
helped a little).
Alex and I have been writing papers about this. A “general audience” piece on
the World-Wide Telescope for Science Magazine, V.293 pp. 2037-2038. 14 Sept
2001. (MS-TR-2001-77
word or
pdf.) More recently we wrote two papers describing the SkyServer. The
first describes how the SkyServer is built and how it is used.: “The
SDSS SkyServer - Public Access to the Sloan Digital Sky Survey Data,”.
A second paper (read only if you loved the first one) goes into gory detail
about the SQL queries we used in data mining. It is MSR TR 2002-01: "Data
Mining the SDSS SkyServer Database.” I have been giving lots of
talks about this.
Tom Barclay, Alex Szalay, and I gave an
overview talk at the Microsoft Faculty Summit that sketches this idea.
I gave a talk on computer technology, arguing for online disks (rather than
nearline tape), cheap processor and storage CyberBricks, and heavy use of
automatic parallelism via database technology. The talk's slides are
PowerPoint(330KB) and an extended abstract of the talk is at
Word (330KB) and
pdf (200KB). The genesis of my interest in this is documented in the
paper: "Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan
Digital Sky Survey", Alexander S. Szalay, Peter Kunszt, Ani Thakar, Jim Gray
MSword(220KB) or
PDF (230 KB).
The
Future of super-computing and computers (Gordon Bell
was the principal author) (8/1/2001)
Gordon Bell assesses the SuperComputing every 5 years or so. This time I helped
and argued with him a bit. This discussion focuses on technical computing, not
AOL or Google or Yahoo! or MSN, each of which would be in the top 10 of the
Top500 if they cared to enter. After 50 years of building high performance
scientific computers, two major architectures exist: (1) clusters of
“Cray-style” vector supercomputers; (2) clusters of scalar uni- and
multi-processors. Clusters are in transition from (a) massively parallel
computers and clusters running proprietary software to (b) proprietary clusters
running standard software, and (c) do-it-yourself Beowulf clusters built from
commodity hardware and software. In 2001, only five years after its
introduction, Beowulf has mobilized a community around a standard architecture
and tools. Beowulf’s economics and sociology are poised to kill off the other
two architectural lines – and will likely affect traditional super-computer
centers as well. Peer-to-peer and Grid communities provide significant
advantages for embarrassingly parallel problems and sharing vast numbers of
files. The Computational Grid can federate systems into supercomputers far
beyond the power of any current computing center. The centers will become
super-data and super-application centers. While these trends make
high-performance computing much less expensive and much more accessible, there
is a dark side. Clusters perform poorly on applications that require large
shared memory. Although there is vibrant computer architecture activity on
microprocessors and on high-end cellular architectures, we appear to be
entering an era of super-computing mono-culture. Investing in next generation
software and hardware supercomputer architecture is essential to improve the
efficiency and efficacy of systems.
Digital Imortalitydoc
or
pdf (10/1/2000)
Gordon and I wrote a piece on the immortality spectrum between passing
knowledge on to future generations: one way immortatlity at one end, and
actually interacting with future generations via two-way immortality where
part of you moves to cyberspace and continues to learn and evolve. It is a
thought-piece for a "special" CACM issue.
A River System (Tobias Mayr of Cornell) (12/14/2000)
Data rivers are a good abstraction for processing large numbers
(billions) of records in parallel. Tobias Mayr, a PhD student at Cornell
visiting BARC in the fall of 2000 designed and started building a river
system. This small web
site describes the current status of that work.
The 10,000$ Terabyte, and
IO studies of Windows2000 (with Leonard Chung) (6/2/2000)
Leonard Chung (an intern from UC Berkeley) and I studied the
performance of modern disks (SCSI and IDE) in comparison to the 1997 study of
Erik Riedel. The conclusions are interesting: IDE disks (with their
controllers) deliver good performance at less than 1/2 the price. One can
package them in servers (8 to a box) and deliver very impressive performance.
Using 40 GB IDE drives, we can deliver a served Terabyte for about 10,000$
(packaged and powered and networked). Raid costs about 2x more. This is
approximately the cost of an un-raided SCSI terabyte. The details are at
IO Studies..
The 1,000$ Terabyte is here with TeraScale Sneakernet . This work is now ongoing with our plans to re-build the TerraServer with SATA CyberBricks.
4 PetaBumps (2/15/1999)
In February 1999, U. Washington (Steve Corbato and others),
ISI-East (Terry Gibbons and others), QWest, Pacific Nortwest Gigapop, and
DARPA's SuperNet, and Microsoft (Ahmed Talat, Maher Saba, Stephen Dahl,
Alesandro Forin, and I) collaborated to set a "land speed records" for tcp/ip
(they were the winners of the first
Interent2 Land Speed Record. The experiment connected two workstations
with SysKonnect Gigabit Ethernet via 10 SuperNet hops (Arlington, NYC, San
Francisco, Seattle, Redmond). The systems delivered 750 mbps in a single stream
tcp/ip (28 GB sent in 5 minutes) and about 900 Mbps when a second stream was
used. This was over a distance of 5600 km, and so gives the metric 3 PetaBumps
(peta bit meters per second). It was "standard" tcp/ip but had two settings:
"jumbo" frames in the routers (4470 bytes rather than 1550 bytes) that give the
endpoints fewer interrupts, and also the window size was set to 20 MB (since
the round trip time was 97 ms you need that much of a window to hold the "20M
in-flight" bits). The details are described in the submissions to the Internet2
committee.
The single-stream submission:
Windows2000_I2_land_Speed_Contest_Entry_(Single_Stream_mail).htm
The multi-stream submission:
Windows2000_I2_land_Speed_Contest_Entry_(Multi_Stream_mail).htm
The code: speedy.htm
, speedy.h,
speedy.c
And a powerpoint presentation about it.
Windows2000_WAN_Speed_Record.ppt (500KB)
This was an extension of some work we did last fall (0.5 PetaBumps) with U. Washington, Research TV, Windows2000, Juniper, Alteon, SysKonnect, NTON, DARPA, Qwest, Nortel Networks, Pacific Northwest GigaPOP, and SC99, we demonstrated 1.3 Gbps (gigabit per second) desktop-to –desktop end-user performance over a LAN, MAN (30 km) , and WAN (300 km) using commodity hardware and software, and standard WinSock + tcp/ip and 5 tcp/ip streams. Here is: The press release, the white paper word (210KB) or PDF (780KB), and PowerPoint Presentation (500KB)
(12/20/99)
Rules of Thumb in DataEngineering
A paper with Prashant Shaenoy, titled "Rules of Thub in Data
Engineering," that revisits Amdahl's laws, Gilder's laws, and investigates the
economics of caching disk and internet data. .
(12/15/99)
Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS
Wrote, with Bill Devlin, Bill Laing, and George Spix, a short piece
trying to define a vocabulary for scaleable systems: Geoplexes, Farms,
Clones, RACS, RAPS, clones, partitions, and packs. The paper defines each
of these terms and discusses the design tradeoffs of using clones, partitions,
and packs.
Large Spatial Databases: I have been investigating large databases like the TerraServer which is documented in two Microsoft technical reports:
We have been operating the TerraServer (http://TerraService.Net/) since June 1998. At this point we have served over 4 billion web hits and 20 terabytes of geospatial images. We are working with Alex Szalay of Johns Hopkins on a similar system to make the Sloan Digital Sky Survey images available on the web as they arrive over the next six years. Our research plan for handing this 40 Terabytes of data over the next five years is described in the report Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey , Alexander S. Szalay, Peter Kunszt, Ani Thakar, Jim Gray. MSR-TR-99-30 .
In addition to the here are some interesting air photos of the Microsoft Redmond Campus.
This is a 1995 proposed Alternate Architecture for EOS DIS (the 15 PB database NASA is building). Here is my PowerPoint summary of the report.(250KB)
WindowsClusters: I believe you can build supercomputers as a cluster of commodity hardware and software modules. A cluster is a collection of independent computers that is as easy to use as a single computer. Managers see it as a single system, programmers see it as a single system, and users see it as a single system. The software spreads data and computation among the nodes of the cluster. When a node fails, other nodes provide the services and data formerly provided by the missing node. When a node is added or repaired, the cluster software migrates some data and computation to that node.
My personal (1995) research plan is contained in the document: Clusters95.doc. It has evolved to a larger enterprise involving many groups within Microsoft, and many of our hardware and software partners. My research is a small (and independent) fragment of the larger NTclusters effort lead by Rod Gamache in the NT group Wolfpack_Compcon.doc (500KB) and a PowerPoint presentation of Wolfpack Clusters by Mark Wood WolfPack Clusters.ppt. (7/3/97) That effort is now called Microsoft Cluster Services and has the web site. Researchers at Cornell University, the MSCS team, and the BARC team wrote a joint paper summarizing MSCS for the Fault Tolerant Computing Symposium. Here is a copy of that paper MSCS_FTCS98.doc (144KB)
We demonstrated SQL Server failover on NT Clusters SQL_Server_Availability.ppt (3MB). The WindowsNT failover time is about 15-seconds, SQL Server failover takes longer if the transaction log contains a lot of undo/redo work. Here is a white-paper describing our design SQL_Server_Clustering_Whitepaper.doc
In 1997, Microsoft showed off many scalability solutions. A one-node terabyte geo-spatial database server (the TerraSever ), and a 45-node cluster doing a billion transactions per day. There were also SAP + SQL + NT-Cluster failover demos, a 50 GB mail store, a 50k user POP3 mail server, a 100 million-hits-per-day web server, and 64-bit addressing SQL Server were also shown. Here are some white papers related to that event: (5/24/97)
A 1998 revision of the SQL Server Scalability white paper is SQL_Scales.doc (800 KB) or the zip version: SQL_Scales.zip (300 KB).
There is much more about this at the Microsoft site http://www.microsoft.com/ntserver/ProductInfo/Enterprise/scalability.asp
I wrote a short paper on storage metrics (joint with Goetz Graefe) discussing optimal page sizes, buffer pool sizes, an DRAM/disk tradeoffs to appear in SIGMOD RECORD 5_min_rule_SIGMOD.doc (.3MB Office97 MS Word file).
Erik Riedel of CMU, Catharine van Ingen, and I have been investigating the best ways to move bulk data on an NT file system. Our experimental results and a paper describing them is at Sequential_IO. (7/28/98) You may also find the PennySort.doc (400 KB) paper interesting -- how to do IO cheaply!
Database Systems: Database systems provide an ideal application to drive the scalability and availability techniques of clustered systems. The data is partitioned and replicated among the nodes. A high-level database language gives a location independent programming interface to the data. If there are many small requests, as in transaction processing systems, then there is natural parallelism within the computation. If there are a few large requests, then the database compiler can translate the high-level database program into a parallel execution plan. CacmParallelDB.doc.
Performance: I helped define the early database and transaction processing benchmarks (TPC A, B, and C). I edited the Benchmark Handbook for Databases and Transaction Processing which is now online as a website at http://www.benchmarkresources.com/handbook/, managed by Brian Butler. (12/12/98) I and am an enthusiastic follower of the emerging database benchmarks Transaction Processing Performance Council. I am the web master for the Sort-Benchmark web site. For 1998, Chris Nyberg and I did the first PennySort benchmark PennySort.doc (400 KB).
Transaction Processing: Andreas Reuter and I wrote the book Transaction Processing Concepts and Techniques. Here are the errata for the 5th printing: TP_Book_Errata_9.htm (17KB) or in word TP_Book_Errata_9.doc (50KB) (5/20/2001) I am working with Microsoft's Viper team that built distributed transactions into NT. Andreas and I taught two courses from the book at Stanford Summer Schools (with many other instructors). The course notes are at WICS99 and WICS96. I helped organize the High Performance Transaction Processing Workshop at Asilomar (9/5/99). The web site makes interesting reading.