|
Have
You Seen These Pages?
AboutMSR Downloads Current Research |
|
|
Microsoft TerraServer
Is a geo-spatial database containing high resolution aerial imagery
and digitized topographic maps provided by the
US Geological Survey (USGS).
The area covered in green in the map on the right identifies the locations where we have imagery or topographic maps.
All imagery and topo map data is stored in a Microsoft SQL Server 2000 relational database. The TerraServer web
application enables users to navigate the database by clicking on buttons and the imagery without any knowledge of
SQL syntax. Currently, there are over 710 million rows of image "tiles" and meta-data in the TerraServer databases.
The TerraServer project is a "loosely formed consortia" of the following public companies and government agencies:
The SQL Server development team provides operations and administrative support for the database software. The SQL team
installs and tests beta versions of SQL Server on the production TerraServer database servers. Thus TerraServer is used
to validate SQL Server releases prior to shipment to customers.
The MSN Home Advisor group currently maintains the content of the web application, receives the credit for the "eyeballs"
that visit TerraServer web site, and pay the network and hosting charges. The TerraServer and Home Advisor databases are cross-indexed
so Home Advisor users can "see the neighborhood" where a home is for sale and see homes for sale where they are
viewing TerraServer imagery.
The TerraServer web site contains an About TerraServer section that
describes the roles of the participants, the technology used to build TerraServer, and the technology we developed in
MS Research to turn SQL Server into a high-volume image server. The remainder of this page is the history of the TerraServer project.
The TerraServer project was commissioned in 1996 by Paul Flessner, then General Manager of SQL Server group. The
"Sphinx" project, which became SQL Server 7.0, was just getting underway. Effectively the group was re-writing
the SQL Server product from the ground up to greatly improve SQL Server's scalability, availability, and reliability. At
the sametime, the Internet was really taking off as popular medium for businesses and consumers. Our mission was to accomplish
two things -- (1) build an interactive, internet-based, database application to test the scalability of a single database server
running Windows NT 4.0 and Sphinx (SQL 7.0), and (2) demonstrate the resulting large application, Sphinx, Windows NT 4.0, and database server
on the public internet. Thus the TerraServer project's requirements were:
When we first started the project, we also had one other partner Aerial Images of Raleigh, North Carolina.
Aerial Images had a relationship with SOVINFORMSPUTNIK to sell de-classified, high resolution Russian military satellite images
taken of locations around the world. TerraServer began at the sametime Aerial Images and SOVINFORMSPUTNIK were forming the
SPIN-2 brand. Thus TerraServer became SPIN-2's "virtual show room". In exchange for an 18 month "lease" of
the SPIN-2 high-resolution data, Microsoft Research built an E-commerce application for Aerial Images and licensed Aerial Images to
run TerraServer application software on their computers after the 18-month lease expired. The 18 month lease expired in December 1999 and
Aerial Images has been operating http://terraserver.com/ off and on since January 2000. Microsoft
TerraServer has continued on exclusively with USGS imagery.
We did not set out to build the world's largest "image server" when we started the project. We evolved into
being a image application based on our goals. As it turns out, it is very difficult to find an INTERESTING TERABYTE. It
is easy to "make up" a synthesized TERABYTE data-set or find a BORING TERABYTE, or find a NON-PUBLIC or EXPENSIVE TERABYTE.
Finding a FREE, INTERESTING, PUBLIC TERABYTE was very challenging.
Because of our relationship with UC Santa Barbara's Earth
Sciences and Digital Library departments, we were introduced the USGS "Digital OrthoQuadrangle" digitized
aerial imagery and the SPOT Image satellite data. We quickly learned the importance of high-resolution imagery when we
compared a 10-meter resolution SPOT Image image of Fulton County Stadium and a 1-meter resolution USGS aerial image.
The cold war had ended earlier in the decade. This lifted the restrictions on the resolution of publicly accessible satellite had recently
be lifted by a treaty signed by President Clinton and President Yeltsin. We happen to stumble into a situation where
technology enabled consumers to view imagery (the Internet, high-speed modems, fast PCs) at the same time that a legal barrier
had been removed. Thus, we had no competition for a high-resolution image server of earth.
It was well into 1997 before we began to design the database, write software, and form the partnerships listed at the top of
this page. Initially only one company could actually build a single computer system with a TERABYTE or more of disk storage running PC software
Digital Equipment Corporation. Digital provided an AlphaServer 4100 with 4 processors and 324 4GB hard drives. This system
enabled us to write application software, test very early releases of SQL Server, and build a loading system to collate thousands
of overlapping, huge aerial & satellite images into millions of edge-matched small image "tiles". TerraServer database stores
imagery in fixed sized, 200 pixel by 200 pixel image files. Each file is stored in a single Binary Large Object (BLOB) adjacent to
its meta-data within the SQL Server database. Web pages are built using HTML tables to align multiple 200x200 pixel files into a
larger image. Thus, we have dubbed the small, fixed sized files as "image tiles" or "tiles".
In May 1997, Jim Gray the manager of the Bay Area Research Center, demonstrated the TerraServer application
in New York city at the "Scalability Day" marketing event. At the time, we only had 200 GB of imagery "in-house" that we
loaded five times in the database in order to get to a tera-byte. We learned alot from this demo. First, we realized that the AlphaServer4100
was really beyond its storage limits with a tera-byte of disks. We really needed more processors and a more robust data backplane.
Second, we needed to learn alot more about image processing and geography. We didn't know that each 45 MB USGS imagery files "overlap the neighboring 45 MB files"
by about 300 pixels. During the demo, Jim had to carefully avoid navigating to far east to show a redundant copy of Building 3 of the Microsoft campus.
Third, the application concept needed a little more work. At the time, we were creating 600 x 400 pixel image tiles and showing
them one-at-a-time on a web page. We quickly discovered that we cut Bill Gates' house in half and could not display his home
on a single web page.
We dubbed the application we built for the Scalability Day event as "TerraServer V1.0". We immediately began
work on fixing the three major problems found in application:
We began loading data in earnest into our new DEC AlphaServer 8400 in January 1998. The Sphinx Team had release Beta 1 of Sphinx and we
tested it heavily. The USGS had delivered approximately 4 TB of uncompressed imagery and SPIN-2 had delivered approximately 500 GB with
more data on the way. In parallel to the loading effort, we wrote a new application using Active Server Page technology and Visual Basic Scripting
edition as the web application development language. We kept our custom written IIS ISAPI DLL to rapidly fetch JPEG "tiles" out of the
database and send them to web browsers in response to HTML IMG tag requests. We rolled a beta test out internally within Microsoft in May 1998 and
formally launched the service on June 24, 1998. This release is dubbed TerraServer V2.0.
The largest mistake we made on TerraServer was estimating its popularity. We initially planned for approximately 1 million "hits"
per day. By "hit" we mean a web request that would cause a database access to build a web page or fetch an image tile. The "traditional"
definition of a "hit" is any web request such as a background JPEG image, a direction button on the TerraServer web page. We don't count these
in our definition of hit. Based on the internal beta, we increased our estimate to 5 million hits per day. When we launched, the USA Today ran a
front page article on our web site and the local TV news in Washington D.C. ran a 120 second piece on the 6, 7, and 11pm news. Luckily, ABC network
news was diverted on the way to our launch. Our 4 web servers and single DEC AlphaServer 8400 were overwhelmed with requests. We successfully handled
40 million hits the first day and we guess that we dropped about 20 to 40 million on the floor. The Sphinx development team were actively monitoring the database server, which was running Beta 2 of the Sphinx software.
They were able to identify 6 bugs in the store engine code. Frankly, there was no other tests at the time that put the database software under that much stress.
From a user's perspective, the TerraServer site performed poorly. By June 27th, we installed an additional 6 web servers bringing the total to 10.
The Sphinx team improved the performance of the beta version several fold within a span of a few days. For myself, I got a graduate education
in how to tune Microsoft's IIS software and how to manage a "web farm" of servers. By June 30th, 6 days after we launched, we had the appropriate
hardware for the volume of usage we were getting (the site had settled down to 15 million hits per day), and I had set the software parameters
such that the web servers would not flood the database server with to much work. We found that governing the load volume to the backend server
was the key attribute to optimizing our performance. We left the site on July 1, 1998 not to return until September 1998 when we upgraded the
Sphinx software to the Beta 3 release.
During the first year of TerraServer's operation, 23 million users visited the site making it one of the most popular sites on the internet.
The first three months, we averaged 10 to 15 million hits per day. By September we had modified our load application to operate remotely
while users were viewing data on-line. We initially had 30% coverage of the United States and a smattering coverage of Europe. New data
began to arrive from the USGS and SPIN-2 organizations. We also received 19,000 e-mails from happy and not so happy users. Most of the positive e-mails
went like this "I say a picture of my house on TerraServer, how cool! This is one of the best uses of the Internet that I have seen, Thank You!"
The negative e-mails were the opposite "What is the matter with you idiots at Microsoft? Your TerraServer has no imagery of Cleveland Ohio. Don't
you know that Cleveland is a major city, home of the Indians baseball team, the Cleveland Browns, and the rock-and-roll hall of fame. When
will you guys wake up and add imagery of Cleveland?" That's pretty much how the mail went. If we had a picture of your house, then you loved the site.
And if we didn't, then you hated. Thus, we proved that "content is king" on the web.
A number of business and government agencies also contacted us asking for information on how we built TerraServer and
feedback on what we did wrong. It became clear that we had hit on something important with TerraServer.
First, we found
a number of companies that were managing millions of images like we were. They were using the file system to store their
images and were running into management nightmares when their image assets approached one million. Fact is, file systems
are designed to manage hundreds of thousands of objects and don't have the robust indexing, locking, logging, and other
technologies to handle multiple millions of things. Thus we proved that relational databases were good repositories for imagery
in addition to tabular data which they are well known for.
Second, we found that their was a professional or business need/interest in using TerraServer. While we had targeted
the web application towards consumers and grade school students, there were a large number of professionals that used TerraServer
on a daily basis. These users were interested into TerraServer for two things (1) adding GIS information and capabilities, and (2)
enabling programmable access to the TerraServer imagery data for use in their company's application. As it turns out,
TerraServer is the only on-line copy of the USGS DOQ asset in one place. The effort we went through to stitch
all the USGS together into a seamless asset is quite valuable to a number of organizations.
The feedback also proved that our tiling scheme needed additional work. TerraServer V2.0 used a "fixed ground coordinate"
system for sizing tiles using longitude and latitude as the unit of measure. Higher resolution tiles where graphically 4 times bigger
that their "zoomed out" cousins which were 4x bigger than their "zoomed out" cousins. While this tiling scheme was
simple to program a load appliation, it caused wierd side-effects in the web application. The most important of which was it was impossible
to predict the physical size of the tiles and downright impossible to figure out the ground coordinate of any given pixel.
With this feedback in mind, we began work on TerraServer V3.0 in late fall of 1998. TerraServer V3.0 was targeted
at changing the tiling scheme to accomplish the following:
Remainder of this page is under construction...
We experimented with a variety of tile sizes 256x256, 250x250, 300x300, 150x150, and 200x200. We were
looking for the "sweet spot" where the compressed size of the tile would be "sizable enough to be worth storing",
small enough to download quickly, and visually correct so that we could fit a variety of monitor sizes such as 800x600, 1024x768, etc.
We found that 200x200 pixel tiles hit the sweet spot.
|
||||||||||||||||||||||||||||
|
|
Back to the top |
|
Privacy Statement |