*
Quick Links|Home|Worldwide
Microsoft*
Search for


Koh-i-Noor

As of August, 2003, Koh-i-Noor is no longer under active development, although we continue to stumble on interesting related results and ideas.

In the Koh-i-Noor project, we are investigating how to construct large, inexpensive, reliable disks. We are using somewhat-novel erasure codes (similar to standard Reed-Solomon codes) and parallel reconstruction techniques during repair. The goal is to allow the construction of extremely large (100 terabyte to 1 petabyte, using today's technology) virtual disks, built by organizing clusters of small disks into modest-sized groups to provide low-level reliability and reduce maintenance costs, without imposing high-overhead on the cost of storage.

In particular, we expect to build clusters of up to 256 disks, each attached to a separate processor. Each processor is connected to two independent networks, organized as a tree using small inexpensive switches. The goal is to provide reliability, not availability, so we assume that processor and/or network reboots will cure many transient failures. Blocks are allocated to the disks using a mapping function that distributes primary storage uniformly; blocks initially mapped to permanently-failed disks are remapped to vacant blocks on surviving disks. Erasure-recovery locations are also assigned by a mapping function. We limit the capacity of a cluster to <85% of the apparent capacity to leave room for correction blocks (~1.5%), and to leave excess capacity to allow re-vectoring of blocks after failure. With roughly 15% overcapacity, we would expect our most-unlucky cluster to still have spare capacity after five years, assuming a MTTF for individual disk drives of 50 years. If we suffer no cabling failures, no dependent failures of hardware, and a similar MTTF for CPUs to those of disks, we might hope to be able to defer any maintenance on any cluster for up to five years.

Triple-erasure-correcting codes give us an expected time of 50,000 years until we experience data loss on any block in the petabyte.

We can attach a greater number of disks to a CPU by assigning them to independent clusters. A CPU or controller failure can then impact several disks, but not ones that rely on the same set of erasure-correction disks. This does have an impact on the statistics of total-system data loss which argues for a greater frequency of servicing.

Project members
Reference material

©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement