Terapixel

Using Microsoft technology to create the largest and clearest image of the sky

Imagine having the ability to take a virtual tour of the cosmos from your living room. Not just a flat, two dimensional tour, but an experience so engrossing that you have the ability to see the entire sky at once then zoom into detailed views of distant galaxies. The Terapixel project from Microsoft Research makes all of that possible by creating the largest and clearest image of the night sky ever produced—a terapixel image—now available in the WorldWide Telescope and Bing Maps. 

 

A section of the Terapixel imageA section of the Terapixel image

Terapixel is a showcase for Microsoft technologies in multi-core computing, in distributed computing, and in scientific workflow management. Terapixel demonstrates how technologies such as Windows HPC, .NET Parallel extensions, Dryad/Dryad-LINQ and Trident Scientific Workflow Workbench can be used to create new possibilities for computation-intensive and data-intensive research in astronomy, bioinformatics and environmental science.

Consolidating Data from Sky Survey Images

The Terapixel project began with data from the Digitized Sky Survey, which is a collection of thousands of images taken over a period of 50 years by two ground based survey telescopes—the Palomar telescope in California, United States, and the UK Schmidt telescope in New South Wales, Australia. The Palomar telescope took photographs of the northern sky, and the southern sky down to around 30 degrees south. The UK Schmidt telescope took photographs of the rest of the southern sky. Each photograph covers an area of the cosmos six and one-half degrees square. For each section of the sky, the digital sky surveys provide two separate images containing the blue and red color intensities. The images themselves are monochromatic; the data simply represents the intensity of blue or red.

Before and after image smoothingBefore and after image smoothingThe telescope imaging process introduced certain artifacts into the plates such as varying levels of brightness, noise, and color saturation, as well as vignetting: a darkening of the edges and the corners of each plate, which needed correction in order to generate a clear and seamless image. Terapixel programmatically removed these anomalies, stitched and smoothed images, and then created image pyramids for visualization in WorldWide Telescope (WWT).

Using Microsoft Technologies to Manage Processes

Developers used Trident, the scientific workflow tool developed by Microsoft Research, to create and manage all of the workflows within the project. Each stage of the process is a Trident workflow activity, from the initial data preparation to sending the terapixel image to the WorldWide Telescope.

Given the large amount of data and computation involved, programmers made use of DryadLINQ and .NET parallel extensions to manage code running in parallel on multi-core machines of a Windows high-performance computing cluster. By making use of a 64-node cluster (512 cores), they were able to compute the final terapixel image from the raw digitized data in a little more than half a day.

Dr. Brian McLean"I look forward to adopting this technology into our everyday compute environment, where we face similar challenges when processing huge datasets."

 

—Dr. Brian McLean, Observatory Scientist

Space Telescope Science Institute

 

Creating a Seamless Image of the Night Sky

Once the original files are decompressed, they undergo a series of programmatic changes to correct the vignetting problem, then the red and blue plates are aligned astrometrically and combined to form a new color image which also contains meta data that maps it to sky coordinates.

The next step is to stitch the color images together into a spherical image and smooth the seams of that image. Terapixel uses the global image optimization program developed by Hugues Hoppe and Dinoj Surendran of Microsoft Research and Michael Kazhdan of Johns Hopkins. The gradients across the image boundaries are set to zero, resulting in a seamless spherical panorama.

The result of the Terapixel project is a full color 24 bit RGB terapixel image of the night sky. The artifacts of the original telescope imaging process have been programmatically removed. The resulting image can be viewed in the WorldWide Telescope and by Bing Maps.

By the Numbers

Raw data

  • 1791 pairs of red-light and blue-light images acquired from two telescopes, scanned into 23,040x23,040 or 14,000x14,000 images.

Windows HPC Cluster

  • The high-performance computational platform used to run Terapixel, consisting of 64 compute nodes, each an eight-core Intel Xeon CPU with 16 GB RAM and 1.7 TB of storage.

Generation of RGB plates

  • Processing time: 5 hours
  • Input: 417 GB (compressed, 4TB uncompressed)
  • Output: 790 GB

Stitch images into a spherical image

  • Processing time: 3 hours

Optimize image to remove seams

  • Processing time: 4 hours 15 minutes

Move data off the cluster

  • 2.5 hrs (1Gbps link)

Output

  • 1025 pyramid files; total size: 802 GB

 

A Microsoft Research Connections-funded project
supporting advanced technology research
 

 

Terapixel Project Introduction
Project Principals

Microsoft Research

Space Telescope Science Institute

  • Brian McLean

Johns Hopkins University

  • Michael Kazhdan