Microsoft Research    Have You Seen These Pages?
   AboutMSR
   Downloads
   Current Research
home
current research
people
search
news
publications
community
conferences
downloads
opportunities
labs
visiting msr
university relations
microsoft.com

 

 

Unstructured Lumigraph Rendering - click for more information.

Microsoft TerraServer

World

Is a geo-spatial database containing high resolution aerial imagery and digitized topographic maps provided by the US Geological Survey (USGS).

The area covered in green in the map on the right identifies the locations where we have imagery or topographic maps. All imagery and topo map data is stored in a Microsoft SQL Server 2000 relational database. The TerraServer web application enables users to navigate the database by clicking on buttons and the imagery without any knowledge of SQL syntax. Currently, there are over 710 million rows of image "tiles" and meta-data in the TerraServer databases.

The TerraServer project is a "loosely formed consortia" of the following public companies and government agencies:

  • Microsoft: Microsoft Research designed and built the TerraServer database and application software. Tom Barclay is the lead developer and project manager for Microsoft TerraServer.

    The SQL Server development team provides operations and administrative support for the database software. The SQL team installs and tests beta versions of SQL Server on the production TerraServer database servers. Thus TerraServer is used to validate SQL Server releases prior to shipment to customers.

    The MSN Home Advisor group currently maintains the content of the web application, receives the credit for the "eyeballs" that visit TerraServer web site, and pay the network and hosting charges. The TerraServer and Home Advisor databases are cross-indexed so Home Advisor users can "see the neighborhood" where a home is for sale and see homes for sale where they are viewing TerraServer imagery.

  • U.S. Geological Survey: The National Mapping Division provided the data (DOQ and DRG data-sets) and mapping expertise. The USGS contact is Beth Duff.
  • Compaq Computer Corporation: Compaq Enterprise Server Division provided the database cluster (4 Compaq ProLiant 8500 8-way processors), and the web server farm (8 Compaq DL360 dual processors). Compaq StorageWorks provided the Enterprise Storage Array of 18 TB of disk media. Compaq Enterprise Server contact is Mike Engbrock. Compaq StorageWorks contact is Will Monin.
  • ADIC: The Advanced Digital Information Corporation (ADIC) provided the Scalar 1000 tape library containing 4 Linear Tape Open (LTO) tape drives. TerraServer database data is regularly backed up to the ADIC Scalar 1000.
  • Veritas: The Veritas Corporation provided the NetBackup Enteprise Backup software program that works with SQL Server software and the ADIC tape library to backup the TerraServer databases.
  • Extreme Networks: The Extreme Networks Summit technology provides a suite of private networks for the TerraServer installation.
Each of the above organizations have a different motivation for participating in the project. The technology companys — Compaq, ADIC, Veritas, Extreme Networks, and product groups within Microsoft — are interested in testing new hardware and software in large and high-volume configurations. TerraServer runs on the latest hardware and software technology so that the project participants can demonstrate to their customers that their products scale nicely and are highly available. The USGS and Microsoft Research agenda is bit difference. Our motivation is to research new ideas and techniques. The USGS is looking for new ways and technologies to reach the public with USGS data-sets and information. TerraServer is the USGS' first attempt to deliver large raster data sets to the public in a non-paper form.

The TerraServer web site contains an About TerraServer section that describes the roles of the participants, the technology used to build TerraServer, and the technology we developed in MS Research to turn SQL Server into a high-volume image server. The remainder of this page is the history of the TerraServer project.

TerraServer History

The TerraServer project was commissioned in 1996 by Paul Flessner, then General Manager of SQL Server group. The "Sphinx" project, which became SQL Server 7.0, was just getting underway. Effectively the group was re-writing the SQL Server product from the ground up to greatly improve SQL Server's scalability, availability, and reliability. At the sametime, the Internet was really taking off as popular medium for businesses and consumers. Our mission was to accomplish two things -- (1) build an interactive, internet-based, database application to test the scalability of a single database server running Windows NT 4.0 and Sphinx (SQL 7.0), and (2) demonstrate the resulting large application, Sphinx, Windows NT 4.0, and database server on the public internet. Thus the TerraServer project's requirements were:

  • BIG — 2.5 TB of data including catalog, temporary space, etc.
  • PUBLIC — available on the world wide web
  • INTERESTING — to a wide audience
  • ACCESSIBLE — using standard browsers (IE, Netscape)
  • REAL — a LOB (line of business) application (users can buy imagery)
  • FREE — cannot require NDA or money to a user to access
  • FAST — usable on low-speed (56kbps) and high speeds(T-1+)
  • EASY — we do not want a large group to develop, deploy, or maintain the application

When we first started the project, we also had one other partner — Aerial Images of Raleigh, North Carolina. Aerial Images had a relationship with SOVINFORMSPUTNIK to sell de-classified, high resolution Russian military satellite images taken of locations around the world. TerraServer began at the sametime Aerial Images and SOVINFORMSPUTNIK were forming the SPIN-2 brand. Thus TerraServer became SPIN-2's "virtual show room". In exchange for an 18 month "lease" of the SPIN-2 high-resolution data, Microsoft Research built an E-commerce application for Aerial Images and licensed Aerial Images to run TerraServer application software on their computers after the 18-month lease expired. The 18 month lease expired in December 1999 and Aerial Images has been operating http://terraserver.com/ off and on since January 2000. Microsoft TerraServer has continued on exclusively with USGS imagery.

We did not set out to build the world's largest "image server" when we started the project. We evolved into being a image application based on our goals. As it turns out, it is very difficult to find an INTERESTING TERABYTE. It is easy to "make up" a synthesized TERABYTE data-set or find a BORING TERABYTE, or find a NON-PUBLIC or EXPENSIVE TERABYTE. Finding a FREE, INTERESTING, PUBLIC TERABYTE was very challenging.

Because of our relationship with UC Santa Barbara's Earth Sciences and Digital Library departments, we were introduced the USGS "Digital OrthoQuadrangle" digitized aerial imagery and the SPOT Image satellite data. We quickly learned the importance of high-resolution imagery when we compared a 10-meter resolution SPOT Image image of Fulton County Stadium and a 1-meter resolution USGS aerial image. The cold war had ended earlier in the decade. This lifted the restrictions on the resolution of publicly accessible satellite had recently be lifted by a treaty signed by President Clinton and President Yeltsin. We happen to stumble into a situation where technology enabled consumers to view imagery (the Internet, high-speed modems, fast PCs) at the same time that a legal barrier had been removed. Thus, we had no competition for a high-resolution image server of earth.

It was well into 1997 before we began to design the database, write software, and form the partnerships listed at the top of this page. Initially only one company could actually build a single computer system with a TERABYTE or more of disk storage running PC software — Digital Equipment Corporation. Digital provided an AlphaServer 4100 with 4 processors and 324 4GB hard drives. This system enabled us to write application software, test very early releases of SQL Server, and build a loading system to collate thousands of overlapping, huge aerial & satellite images into millions of edge-matched small image "tiles". TerraServer database stores imagery in fixed sized, 200 pixel by 200 pixel image files. Each file is stored in a single Binary Large Object (BLOB) adjacent to its meta-data within the SQL Server database. Web pages are built using HTML tables to align multiple 200x200 pixel files into a larger image. Thus, we have dubbed the small, fixed sized files as "image tiles" or "tiles".

In May 1997, Jim Gray the manager of the Bay Area Research Center, demonstrated the TerraServer application in New York city at the "Scalability Day" marketing event. At the time, we only had 200 GB of imagery "in-house" that we loaded five times in the database in order to get to a tera-byte. We learned alot from this demo. First, we realized that the AlphaServer4100 was really beyond its storage limits with a tera-byte of disks. We really needed more processors and a more robust data backplane. Second, we needed to learn alot more about image processing and geography. We didn't know that each 45 MB USGS imagery files "overlap the neighboring 45 MB files" by about 300 pixels. During the demo, Jim had to carefully avoid navigating to far east to show a redundant copy of Building 3 of the Microsoft campus. Third, the application concept needed a little more work. At the time, we were creating 600 x 400 pixel image tiles and showing them one-at-a-time on a web page. We quickly discovered that we cut Bill Gates' house in half and could not display his home on a single web page.

We dubbed the application we built for the Scalability Day event as "TerraServer V1.0". We immediately began work on fixing the three major problems found in application:

  1. Digital Equipment Corporation replaced the 4x300mhz processor AlphaServer 4100 with an 8x440mhz processor AlphaServer 8400 with 10GB of RAM. The twin tera-bit I/O channels were easily capable of managing the 2.5TB RAID-5 configuration of 324 9 GB hard drives. This system arrived at Microsoft in December of 1997 and ran Windows NT 4.0 Enterprise Edition "out of the box".
  2. We met extensively with scientists and programmers at the US Geological Survey in Sioux Falls South Dakota, Menlo Park California, Denver Colorado, and Reston Virginia. They gave us whirlwind education in the cartographic science of map projects and remote sensing issues. With help from Rick Szeliski, a colleague in Microsoft Research, we built our own graphics library that enabled us to accurately stitch together pixels extracted from multiple, large, uncompressed aerial/satellite images into collections of compressed image tile/files. This enabled us to build an automated data loading system where we could move large images from tape, merge pixels from multiple large files, tile them into small files, and load the tiles directly into the database.
  3. We designed a new tiling scheme where we would display multiple tiles in an HTML table on a single web page. We provided buttons so the user could control how many tiles to display on the web page at one time. This fixed the "Bill Gates" bug where the user could make a larger photograph and reposition their point of interest into the center of the page.

We began loading data in earnest into our new DEC AlphaServer 8400 in January 1998. The Sphinx Team had release Beta 1 of Sphinx and we tested it heavily. The USGS had delivered approximately 4 TB of uncompressed imagery and SPIN-2 had delivered approximately 500 GB with more data on the way. In parallel to the loading effort, we wrote a new application using Active Server Page technology and Visual Basic Scripting edition as the web application development language. We kept our custom written IIS ISAPI DLL to rapidly fetch JPEG "tiles" out of the database and send them to web browsers in response to HTML IMG tag requests. We rolled a beta test out internally within Microsoft in May 1998 and formally launched the service on June 24, 1998. This release is dubbed TerraServer V2.0.

The largest mistake we made on TerraServer was estimating its popularity. We initially planned for approximately 1 million "hits" per day. By "hit" we mean a web request that would cause a database access to build a web page or fetch an image tile. The "traditional" definition of a "hit" is any web request such as a background JPEG image, a direction button on the TerraServer web page. We don't count these in our definition of hit. Based on the internal beta, we increased our estimate to 5 million hits per day. When we launched, the USA Today ran a front page article on our web site and the local TV news in Washington D.C. ran a 120 second piece on the 6, 7, and 11pm news. Luckily, ABC network news was diverted on the way to our launch. Our 4 web servers and single DEC AlphaServer 8400 were overwhelmed with requests. We successfully handled 40 million hits the first day and we guess that we dropped about 20 to 40 million on the floor. The Sphinx development team were actively monitoring the database server, which was running Beta 2 of the Sphinx software. They were able to identify 6 bugs in the store engine code. Frankly, there was no other tests at the time that put the database software under that much stress.

From a user's perspective, the TerraServer site performed poorly. By June 27th, we installed an additional 6 web servers bringing the total to 10. The Sphinx team improved the performance of the beta version several fold within a span of a few days. For myself, I got a graduate education in how to tune Microsoft's IIS software and how to manage a "web farm" of servers. By June 30th, 6 days after we launched, we had the appropriate hardware for the volume of usage we were getting (the site had settled down to 15 million hits per day), and I had set the software parameters such that the web servers would not flood the database server with to much work. We found that governing the load volume to the backend server was the key attribute to optimizing our performance. We left the site on July 1, 1998 not to return until September 1998 when we upgraded the Sphinx software to the Beta 3 release.

During the first year of TerraServer's operation, 23 million users visited the site making it one of the most popular sites on the internet. The first three months, we averaged 10 to 15 million hits per day. By September we had modified our load application to operate remotely while users were viewing data on-line. We initially had 30% coverage of the United States and a smattering coverage of Europe. New data began to arrive from the USGS and SPIN-2 organizations. We also received 19,000 e-mails from happy and not so happy users. Most of the positive e-mails went like this — "I say a picture of my house on TerraServer, how cool! This is one of the best uses of the Internet that I have seen, Thank You!" The negative e-mails were the opposite — "What is the matter with you idiots at Microsoft? Your TerraServer has no imagery of Cleveland Ohio. Don't you know that Cleveland is a major city, home of the Indians baseball team, the Cleveland Browns, and the rock-and-roll hall of fame. When will you guys wake up and add imagery of Cleveland?" That's pretty much how the mail went. If we had a picture of your house, then you loved the site. And if we didn't, then you hated. Thus, we proved that "content is king" on the web.

A number of business and government agencies also contacted us asking for information on how we built TerraServer and feedback on what we did wrong. It became clear that we had hit on something important with TerraServer.

First, we found a number of companies that were managing millions of images like we were. They were using the file system to store their images and were running into management nightmares when their image assets approached one million. Fact is, file systems are designed to manage hundreds of thousands of objects and don't have the robust indexing, locking, logging, and other technologies to handle multiple millions of things. Thus we proved that relational databases were good repositories for imagery in addition to tabular data which they are well known for.

Second, we found that their was a professional or business need/interest in using TerraServer. While we had targeted the web application towards consumers and grade school students, there were a large number of professionals that used TerraServer on a daily basis. These users were interested into TerraServer for two things — (1) adding GIS information and capabilities, and (2) enabling programmable access to the TerraServer imagery data for use in their company's application. As it turns out, TerraServer is the only on-line copy of the USGS DOQ asset in one place. The effort we went through to stitch all the USGS together into a seamless asset is quite valuable to a number of organizations.

The feedback also proved that our tiling scheme needed additional work. TerraServer V2.0 used a "fixed ground coordinate" system for sizing tiles using longitude and latitude as the unit of measure. Higher resolution tiles where graphically 4 times bigger that their "zoomed out" cousins which were 4x bigger than their "zoomed out" cousins. While this tiling scheme was simple to program a load appliation, it caused wierd side-effects in the web application. The most important of which was it was impossible to predict the physical size of the tiles and downright impossible to figure out the ground coordinate of any given pixel.

With this feedback in mind, we began work on TerraServer V3.0 in late fall of 1998. TerraServer V3.0 was targeted at changing the tiling scheme to accomplish the following:

  1. Create fixed sized tiles at each level of resolution (zoom level),
  2. Increase the number of zoom level from 3 to 7,
  3. Support the native projection of the data-set instead of using our ground-based, variable sized tiling scheme,
  4. Simplify the load process so that pixel merging could be done using compressed imagery located in the database instead of original, uncompressed imagery stored in the file system,
  5. Enable data-sets maintained in the same projection system to be layered on top of other data-sets in the same projection system
When we started the TerraServer V3.0 project, we had about 800 GB of data loaded into our 2.5 TB database server running TerraServer V2.0. We decided we would continue to data into the TerraServer V2.0 database until it reached approximately 1.0 TB of user data. At the same time, we re-designed the database schema, re-wrote the load programs, and began loading from tape number 1 into the TerraServer V3.0 database stored on the same DEC AlphaServer 8400. So during the spring of 1999, the TerraServer database server was doing double duty by loading both TerraServer V2.0 and TerraServer 3.0 databases while servicing read requests of TerraServer V2.0.

Remainder of this page is under construction... We experimented with a variety of tile sizes — 256x256, 250x250, 300x300, 150x150, and 200x200. We were looking for the "sweet spot" where the compressed size of the tile would be "sizable enough to be worth storing", small enough to download quickly, and visually correct so that we could fit a variety of monitor sizes such as 800x600, 1024x768, etc. We found that 200x200 pixel tiles hit the sweet spot.


Back to the top
Copyright 2001 Microsoft Corporation.  Please address comments on this web site  to msrwww@microsoft.com. This server contains links to servers not under the control of Microsoft Corporation.
Privacy Statement
Alaska Canada North America South America Western Europe Africa Asia South Pacific