Johnson Apacible, Rich Draves, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell, Ed Nightingale, Reuben Olinsky, and Yutaka Suzue
15 May 2012
We have built a new high-performance distributed blob storage system, called Flat Datacenter Storage (FDS). Our MinuteSort entry is a relatively simple Daytona-class sort application that uses FDS for storage. We also have an Indy mode that is identical to the Daytona version except that input sampling is disabled and a uniform key distribution is assumed.
The sorts were accomplished using a heterogeneous cluster consisting of 256 computers and 1,033 disks, divided broadly into two classes: storage nodes and compute nodes. Notably, no compute node in our system uses local storage for data; we believe FDS is the first system with competitive sort performance that uses remote storage. Because files are all remote, our 1,470 GB runs actually transmitted 4.4 TB over the network in under a minute. No strong assumptions are made around key or record lengths; keys and records of other lengths can be handled with only a performance-neutral configuration change.