No Frames version of homepage from 1999

photo of Jim Jim Gray
Microsoft
Research, Bay Area Research Center (map)
San Francisco, California, USA

Jim Gray is a senior researcher and distinguished engineer in Microsoft's Scaleable Servers Research Group
and manager of Microsoft's Bay Area Research Center (BARC).

Email: Gray@microsoft.com
Internet site: Gray
Intranet site: http://team/sites/barc
Office: 455 Market St, Suite 1690, San Francisco, CA. 94105
Phone: (415) 778-8222
Fax: (415) 778-8210


Research Interests

The World-Wide Telescope: Building the virtual astronomy observatory of the future ( 10/1/2001)
Astronomers are collecting huge quantities of data, and they are starting to federate this data. They held a Virtual Observatory conference in Pasadena (and the book) to discuss the scientific and technical aspects of building a virtual observatory that would give anyone anywhere access to all the online astronomy data. More recently (10/15/2001) Alex Szalay and I wrote an “general audience” piece for Science Magazine: , V.293 pp. 2037-2038. 14 Sept 2001. (MS-TR-2001-77 word or pdf.) This will create a "virtual" telescope on the sky (with great response time). Information at your fingertips for astronomers and for anyone else. Tom Barclay, Alex Szalay, and I gave an overview talk at the Microsoft Faculty Summit that sketches this idea. I gave a talk on computer technology, arguing for online disks (rather than nearline tape), cheap processor and storage CyberBricks, and heavy use of automatic parallelism via database technology. The talk's slides are PowerPoint(330KB) and an extended abstract of the talk is at Word (330KB) and pdf (200KB).  There is a paper related to the Sloan Digital Sky Survey: "Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey", Alexander S. Szalay, Peter Kunszt, Ani Thakar, Jim Gray MSword (220KB) or PDF (230 KB).  I am working closely with Alex Szalay and other astronomers to add spatial data support to SQL server (working with Peter Kunszt to add his HTM (Hierarchical Triangular Mesh spatial data access) to SQL Server),  to build a public-outreach "sky server" that has the early Sloan data http://skyServer.sdss.org , and also working with him on data mining. Papers on this are in preparation, ask for them if you are interested. Digital Immortality (Gordon Bell was principal author) (10/1/2000)

The Future of super-computing and computers (8/1/2001)
Gordon and I wrote a piece on the immortality spectrum between passing knowledge on to future generations: one way immortatlity at one end, and actually interacting with future generations via two-way immortality where part of you moves to cyberspace and continues to learn and evolve. It is a thought-piece for a "far out" CACM issue. Click here to download the paper MSR_TR_101 Microsoft Word document (50KB) or Adobe Acrobat (30KB).

Digital Immortality (Gordon Bell was principal author) (10/1/2000)

Gordon and I wrote a piece on the immortality spectrum between passing knowledge on to future generations: one way immortatlity at one end, and actually interacting with future generations via two-way immortality where part of you moves to cyberspace and continues to learn and evolve. It is a thought-piece for a "far out" CACM issue. Click here to download the paper MSR_TR_101 Microsoft Word document (50KB) or Adobe Acrobat (30KB).

A River System (Tobias Mayr of Cornell) (12/14/2000)

Data rivers are a good abstraction for processing large numbers (billions) of records in parallel.   Tobias Mayr, a PhD student at Cornell visiting BARC in the fall of 2000  designed and started building a river system.  This small web site describes the current status of that work.

The 10,000$ Terabyte, and IO studies of Windows2000 (with Leonard Chung) (6/2/2000)

Leonard Chung (an intern from UC Berkeley) and I studied the performance of modern disks (SCSI and IDE) in comparison to the 1997 study of Erik Riedel. The conclusions are interesting: IDE disks (with their controllers) deliver good performance at less than 1/2 the price. One can package them in servers (8 to a box) and deliver very impressive performance. Using 40 GB IDE drives, we can deliver a served Terabyte for about 10,000$ (packaged and powered and networked). Raid costs about 2x more. This is approximately the cost of an un-raided SCSI terabyte. The details are at IO Studies. and the research report is MSword 1.3 MB , PDF 500KB .

4 PetaBumps (2/15/1999)

In mid-February, U. Washington (Steve Corbato and others), ISI-East (Terry Gibbons and others), QWest, Pacific Nortwest Gigapop, and DARPA's SuperNet, and Microsoft (Ahmed Talat, Maher Saba, Stephen Dahl, Alesandro Forin, and I) collaborated to set a "land speed record" for tcp/ip (they were the winners of the first Interent2 Land Speed Record. The experiment connected two workstations with SysKonnect Gigabit Ethernet via 10 SuperNet hops (Arlington, NYC, San Francisco, Seattle, Redmond). The systems delivered 750 mbps in a single stream tcp/ip (28 GB sent in 5 minutes) and about 900 Mbps when a second stream was used. This was over a distance of 5600 km, and so gives the metric 3 PetaBumps (peta bit meters per second). It was "standard" tcp/ip but had two settings: "jumbo" frames in the routers (4470 bytes rather than 1550 bytes) that give the endpoints fewer interrupts, and also the window size was set to 20 MB (since the round trip time was 97 ms you need that much of a window to hold the "in flight" bits). The details are described in the submissions to the Internet2 committee.
The single-stream submission: Windows2000_I2_land_Speed_Contest_Entry_(Single_Stream_mail).htm
The multi-stream submission: Windows2000_I2_land_Speed_Contest_Entry_(Multi_Stream_mail).htm
The code: speedy.htm , speedy.h speedy.c
And a powerpoint presentation about it. Windows2000_WAN_Speed_Record.ppt (500KB)

This was an extension of some work we did last fall (0.5 PetaBumps) with U. Washington, Research TV, Windows2000, Juniper, Alteon, SysKonnect, NTON, DARPA, Qwest, Nortel Networks, Pacific Northwest GigaPOP, and SC99, we demonstrated 1.3 Gbps (gigabit per second) desktop-to –desktop end-user performance over a LAN, MAN (30 km) , and WAN (300 km) using commodity hardware and software, and standard WinSock + tcp/ip and 5 tcp/ip streams.   
Here is:
The press release
The white paper word (210KB) or PDF (780KB),
and PowerPoint Presentation (500KB)

(12/20/99)   Rules of Thumb in DataEngineering

A paper with Prashant Shaenoy, titled "Rules- of- thumb in data engineering," that revisits Amdahl's laws, Gilder's laws, and investigates the economics of caching disk and internet data. Here is the 150KB MSword , and the 80 KB PDF.

(12/15/99) Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS

Wrote, with Bill Devlin, Bill Laing, and George Spix, a short piece trying to define a vocabulary for scaleable systems:  Geoplexes, Farms, Clones, RACS, RAPS, clones, partitions, and packs.  The paper defines each of these terms and discusses the design tradeoffs of using clones, partitions, and packs.  Here is the 350KB MSword, and the 300 KB PDF.

Web sites I manage:

Large Spatial Databases:  I have been investigating large databases like the TerraServer which is documented in two Microsoft technical reports:

We have been operating the TerraServer (http://www.terraserver.microsoft.com/) since June 1998. At this point we have served over 4 billion web hits and 20 terabytes of geospatial images. We are working with Alex Szalay of Johns Hopkins on a similar system to make the Sloan Digital Sky Survey images available on the web as they arrive over the next six years.  Our research plan for handing this 40 Terabytes of data over the next five years is described in the report "Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey," Alexander S. Szalay, Peter Kunszt, Ani Thakar, Jim Gray. MSR-TR-99-30 The source is MSword (220KB) or PDF (230 KB).

In addition to the here are some interesting air photos of the Microsoft Redmond Campus.

This is a 1995 proposed Alternate Architecture for EOS DIS (the 15 PB database NASA is building). Here is my PowerPoint summary of the report.(250KB)

WindowsClusters: I believe you can build supercomputers as a cluster of commodity hardware and software modules. A cluster is a collection of independent computers that is as easy to use as a single computer. Managers see it as a single system, programmers see it as a single system, and users see it as a single system. The software spreads data and computation among the nodes of the cluster. When a node fails, other nodes provide the services and data formerly provided by the missing node. When a node is added or repaired, the cluster software migrates some data and computation to that node.

My personal (1995) research plan is contained in the document: Clusters95.doc. It has evolved to a larger enterprise involving many groups within Microsoft, and many of our hardware and software partners. My research is a small (and independent) fragment of the larger NTclusters effort lead by Rod Gamache in the NT group Wolfpack_Compcon.doc (500KB) and a PowerPoint presentation of Wolfpack Clusters by Mark Wood WolfPack Clusters.ppt. (7/3/97) That effort is now called Microsoft Cluster Services and has the web site. Researchers at Cornell University, the MSCS team, and the BARC team wrote a joint paper summarizing MSCS for the Fault Tolerant Computing Symposium. Here is a copy of that paper MSCS_FTCS98.doc (144KB)

We demonstrated SQL Server failover on NT Clusters SQL_Server_Availability.ppt (3MB). The WindowsNT failover time is about 15-seconds, SQL Server failover takes longer if the transaction log contains a lot of undo/redo work. Here is a white-paper describing our design SQL_Server_Clustering_Whitepaper.doc

In 1997, Microsoft showed off many scalability solutions. A one-node terabyte geo-spatial database server (the TerraSever ), and a 45-node cluster doing a billion transactions per day. There were also SAP + SQL + NT-Cluster failover demos, a 50 GB mail store, a 50k user POP3 mail server, a 100 million-hits-per-day web server, and 64-bit addressing SQL Server were also shown. Here are some white papers related to that event: (5/24/97)

A 1998 revision of the SQL Server Scalability white paper is SQL_Scales.doc (800 KB) or the zip version: SQL_Scales.zip (300 KB).

There is much more about this at the Microsoft site http://www.microsoft.com/windows2000/guide/datacenter/overview/default.asp

I wrote a short paper on storage metrics (joint with Goetz Graefe) discussing optimal page sizes, buffer pool sizes, an DRAM/disk tradeoffs to appear in SIGMOD RECORD 5_min_rule_SIGMOD.doc (.3MB Office97 MS Word file).

Erik Riedel of CMU, Catharine van Ingen, and I have been investigating the best ways to move bulk data on an NT file system. Our experimental results and a paper describing them is at Sequential_IO. (7/28/98) You may also find the PennySort.doc (400 KB) paper interesting -- how to do IO cheaply!

Database Systems: Database systems provide an ideal application to drive the scalability and availability techniques of clustered systems. The data is partitioned and replicated among the nodes. A high-level database language gives a location independent programming interface to the data. If there are many small requests, as in transaction processing systems, then there is natural parallelism within the computation. If there are a few large requests, then the database compiler can translate the high-level database program into a parallel execution plan. CacmParallelDB.doc.

Performance: I helped define the early database and transaction processing benchmarks (TPC A, B, and C). I edit the Benchmark Handbook for Databases and Transaction Processing, and am an enthusiastic follower of the emerging database benchmarks Transaction Processing Performance Council. The Benchmark Handbook has become a WebSite Benchmark handbook web site managed by Brian Butler. (12/12/98) I am the web master for the Sort-Benchmark web site. For 1998, Chris Nyberg and I did the first PennySort benchmark PennySort.doc (400 KB).

Transaction Processing: Andreas Reuter and I wrote the book Transaction Processing Concepts and Techniques . Here are the errata for the 5th printing: TP_Book_Errata_9.htm (17KB) or in word TP_Book_Errata_9.doc (50KB) (5/20/2001) I am working with Microsoft's Viper team that built distributed transactions into NT. Andreas and I taught two courses from the book at Stanford Summer Schools (with many other instructors). The course notes are at WICS99 and WICS96. I helped organize the High Performance Transaction Processing Workshop at Asilomar (9/5/99). The web site makes interesting reading.

External Activities

Publication Boards

Morgan Kaufmann Data Management Series, Editor

Data Mining and Knowledge Discovery, Editor

Moderator of the Database Section of the Computer science online Research Repository (CoRR)

Advisory Boards

Presidential Advisory Committee on Information Technology The committee's web site is http://www.ccic.gov/ac and the final report is at http://www.ccic.gov/ac/report/. Here is an MSword version of the report.  This committee has stakeholders from everywhere in the IT industry (telcos, industry, academe, citizens, government, ...), 27 people in all. They observe that:

The committee recommends:

The report is very dull reading (a 5 page executive summary makes 35 recommendations!). But, it is important for the wide IT community to appreciate that this is the best we are going to get from a committee. There is much good hiding in the report. I encourage you to look at it (at least the executive summary.

A narrower but related work is the Asilomar Report on Database Research. Fifteen DB researchers got together for a 3-day retreat to think about how we do DB research and how we should do it in the future. The paper is short and an easy read. Here is the HTML (34KB) and MSword (66KB).

GriPhyN Research Consortium Execuitive Council member (They are building tools for the international Physics Computational Grid),

Ordinal Corporation

member of Library of Congress advisory group to develop digital Information Infrastructure and Preservation Program

(they sort fast), Board Member

Emeritus

VLDB Journal, Editor in Chief and Endowment Board

National Research Council, Computer Science and Telecommunications Board

National Research Council, Computer Science and Telecommunications Board study of the Library of Congress.

Stanford University School of Engineering, Advisory Board Member

Societies

Association of Computing Machinery, Fellow

National Academy of Engineering, Member

Research Collaborations

Working with USGS and Aerial Images on the TerraServer project.

Working with Alex Szalay of John's Hopkins on the Sloan Digital Sky Survey Archive.

Co-investigator with NASA sponsored Earth Science Data Analysis project led by Jim Frew.

Working on ACM subcommittee chaired by Joseph Halpern of Cornell to put all CS research articles on the web --CoRR ( http://xxx.lanl.gov/archive/cs ), and moderator of the Database Section of the Computer science online Research Repository (CoRR)

Microsoft sponsors of the following university research efforts (and I monitor the grants):

Infrastructure

Andreas's Reuter's International University, the first university in Germany adopts the American university structure, offers courses in English, and focuses on Information technology and Business Administration.

Michael Ley's University of Tirer online archives and index of computer science literature.

Alex Szalay at Johns Hopkins, FermiLab, and Sloan Digital Sky Survey building the Sky Server.

Information at your fingertips (aka: databases, data mining, the web,...)

Marti Hurst at UC Berkeley, WebTango Automatically Evaluating Web Site Designs

Olvi Mangasarian at Wisconsin, The Data Mining Institute

Johannes Gehrke at Cornell, Online Data-Mining Operators.

Mike Franklin at UC Berkeley, Processing Continuous Queries in a Distributed Environment
Dan Suciu
at University of Washington, An XML Transformer Toolkit

Scalable Computing

Werner Vogels Galaxy scaleable cluster management Cornell.

Publications

What follows are mostly MS Word and PowerPoint documents. Free viewers for PowerPointand Word are available from Microsoft for Windows and MacOS clients. These can work as plugins to Netscape, Spry, Microsoft Internet Explorer , and other free browsers. Here are pointers to viewers for Word , and Powerpoint .

Turing Lecture: My Turing Award Lecture: What Next? A dozen remaining IT problems was presented at the ACM Federated Research Computer Conference in Atlanta, Georgia, on 4 May 1999. A refined version of it was re-presented at the SIGMOD conference in Philadelphia in June, and at many universities (perhaps it should be called the Touring Award :). There is an accompanying article that goes with the talk. Criticism and suggestions on it are very welcome. Here are versions of the talk in the popular formats: Gray_Turing_FCRC.ppt (5 MB) or Gray_Turing_FCRC.pdf (2.8 MB) or Gray_Turing_FCRC.htm (30 KB at a time, 7 MB in all). A text version of the talk as at Microsoft Word document or Adobe Acrobat document and

Clusters, see also the Scaleable servers web page for additional references

(5/20/97): Some technical details of the TerraServer and the billion transactions per day system: Two_Commodity_Scaleable_Servers.doc (3.5MB) or the zip version Two_Commodity_Scaleable_Servers.zip (2 MB).

(12/28/97): Updated SQL Server Scalability document SQL_Scales.doc (6MB) or the zip version: SQL_Scales.zip (250 KB).

(5/20/97): Updated SQL NT-Cluster white paper: SQL_Server_Clustering_Whitepaper.doc

Researchers at Cornell University, the MSCS team, and the BARC team wrote a joint paper summarizing MSCS for the Fault Tolerant Computing Symposium. Here is a copy of that paper MSCS_FTCS98.doc (144KB)

(4/1/97): A white-paper (Word Office 97 format) describing SQL_Server_Clustering_Whitepaper.doc (500KB) and a nice white paper by the Wolfpack team on their design (Rob Short, Rod Gamache, John Vert and Mike Massa) that appeared at CompCon97 Wolfpack_Compcon (300KB Word97 doc).

We demonstrated SQL Server failover on NT Clusters SQL_Server_Availability.ppt (3MB). This is partly NT Clusters work and partly SQL work.

Database Systems & Issues

For 1998, Chris Nyberg and I did the first PennySort benchmark PennySort.doc (400 KB).

The Microsoft TerraServer (www.TerraServer.microsoft.com) is on the Internet. It is a terabyte geo-spatial database. We at BARC (particularly Barclay, Slutz, VanIngen, and Gray) in cooperation with the USGS and the Russian Space Agency have been working on this for over 2 years. The following white paper describes the application design (its BIG): TerraServer_TR.doc (8MB).

Erik Riedel of CMU, Catharine van Ingen, and I have been investigating the best ways to move bulk data on an NT file system. Our experimental results and a paper describing them is at Sequential_IO.

A short paper on storage metrics (joint with Goetz Graefe) discussing optimal page sizes, buffer pool sizes, an DRAM/disk tradeoffs to appear in SIGMOD RECORD 5_min_rule_SIGMOD.doc (.3MB Office97 MS Word file).

Nsort is a fast sorting program on SGI-IRIX systems. This system can sort 5GB/minute (that's very fast!!!). This paper is joint with Chris Nyberg and Charles Koester Nsort white paper

Parallel Database Systems tutorial (SIGMOD 95 and VLDB 95): paper: PDB95.doc (140 KB) and talk PDB95.ppt (3.5 MB)

DataCube.doc (155 KB) A new relational operator used for data mining. The talk is :DataCube.ppt (800KB)

Replicas.doc (510 KB) Analyzes the scalability of replication algorithms. The postscript version is replicas.ps (900KB). and replicas.rtf (1.3MB). and the talk is replica.ppt (450KB)). In addition, Brad Hammond of Microsoft wrote a nice (related) paper on how Microsoft's Access and Visual Basic products do replication in a mobile application environment WingmanReplicas.doc (135KB).

Isolation.doc (110 KB) Analyzes the concurrency control models of SQL and other systems . Introduces a new notion called SnapShot isolation.

ParallelDBLoad.doc (195 KB) Describes a prototype that loads data into a database in parallel at 1TB/day. SyntheticDataGen.doc (200 KB) or PDF (100KB) Explains how to generate and index a synthetic database in parallel at a billion records per hour.

AlphaSort.doc (104 KB)Sorts 1.2 GB (12 M records) in a minute.

History of System R (text of 20 year reunion): Text of the 20 year reunion of the folks who built the first SQL system.

Database Research Directions (Lagunita II): (170 KB Postscript) A 1995 report by senior database researchers on our community's agenda for the rest of the decade.

Article on the history of data management for IEEE Spectrum 50th Anniversary issue in Microsoft Word format: DB_History.doc and the same file as HTML (some formatting lost): DB_History.html.

Transaction Processing

Argues that queues are best implemented atop a database system rather than being a free-standing subsystem QueueIsDB.doc (22 KB) . Talk is QueueIsDB.ppt (80KB)

Slides of a lecture on transactions for Eric Brewer's OS class at UC Berkeley. Transactions_Tutorial.ppt (1 MB)

Slides for 1996 one week class on Transaction Processing taught by Andreas Reuter, Charles Leveine, Robert Orfali, Pete Homan, and Pat Helland. This is a tour of Andreas's and my book on transaction processing. WICS TP class at Stanford Summer School and the 1999 version of the WICS TP class ; with Andreas Retuer, Dieter Gawlick, Suaan Malaika, Greg Hope, Dan Harkey, Phil Bernstein, Greg Hope, and Charles Levine. ; I helped organize the High Performance Transaction Processing Workshop at Asilomar (9/5/99). The web site makes interesting reading.

GeoSpatial Databases

See top of page re TerraServer and Sloan Digital Sky Survey.

Architecture

I am enamored of smart disks. Several of my recent talks make the point. Perhaps the most focused talk is: Gray_NASD_Talk.pptGray_NASD_Talk.ppt

For 1998, Chris Nyberg and I did the first PennySort benchmark PennySort.doc (400 KB). It shows what you can do with commodity cyberbricks today.

Gordon Bell and I wrote a paper for the 50th anniversary of the ACM, talking about what computing might be like 50 years from now: can you spell chutzpa? Here it is revolution.doc (150KB).

NC Servers.doc (27kB), my take of the discussion on Network Computers. It argues that network bandwidth issues and centralized support issues make NCs unattractive for the WAN but fine for the LAN.

A study of the KSR COMA design on Oracle applications. COMA.ps (183 KB)

Electronic Journals

Moderator of the Database Section of the Computer science online Research Repository (CoRR)

The online SIGMOD Record and online VLDB Journal .

Upcoming talks

18 April: Talk on scaleable systems at NCSA, Urbana.

List of recent talks with slides.

Recommended articles

Vannevar Bush's paper: As We May Think

Richard Feynman's paper: There's Plenty of Room at the Bottom

Alan Newell on how to do research (one hour video lecture)

Ed Lazowska (U. Washington) faculty lecture on Computer Science

Michael Lesk's paper on "How much information is there in the world?"

Dave Patterson's talk on Intelligent Disks.

A nice historical piece on Moore's law (90KB html) a 1996 student papers by Bob Schaller explaining the history of the law and some of its implications.

Andrew Odlyzko has a fascinating and insightful series of articles on online publishing: His papers are what got me started on my goal of getting all scientific literature on the web.

Warning! Some links in this area let you leave Microsoft's site. The linked sites are not under Microsoft's control and Microsoft is not responsible for the contents of linked sites, or any links contained in a linked sites, or any changes or updates to such sites. Microsoft provides these links only as a convenience. The inclusion of a link does not imply endorsement by Microsoft of the site.


msr home | search | features | | research | Microsoft

© 2000 Microsoft and/or its suppliers. All rights reserved. Terms of Use