Does erasure coding have a role to play in my data center?

Today replication has become the de facto standard for storing data within and across data centers that process data-intensive workloads. Erasure coding (a form of software RAID), although heavily researched and theoretically more space efficient than replication, has complex tradeoffs which are not well-understood by practitioners. Today's data centers have diverse foreground and background data-intensive workloads, and getting these tradeoffs right is becoming increasingly important. Through a series of realistic data center deployment scenarios and workload characteristics, coupled with the implementation of a prototype Hadoop library with erasure coding functionalities, we revisit traditional metrics (performance and dollar cost), present new tradeoffs (power proportionality and complexity) and make recommendations on directions worth researching.

paper.pdf
PDF file

Publisher  Microsoft Research
© 2009 Microsoft Corporation. All rights reserved.

Details

TypeTechReport
NumberMSR-TR-2010-52
> Publications > Does erasure coding have a role to play in my data center?