Going beyond Paxos

  • Mahesh Balakrishnan ,
  • Dahlia Malkhi ,
  • Vijayan Prabhakaran ,
  • Ted Wobber

MSR-TR-2011-86 |

Corfu exposes a cluster of flash devices to applications as a single, shared log. Applications can append data to this log or read from the middle. Internally, this shared log is implemented as a distributed log spread over the flash cluster. There are two reasons why this design makes sense:

  1. From a bottom-up perspective, flash requires log-structured writes to ensure even and minimal wear-out as well as high throughput. By implementing a distributed log, we eliminate the need for low-level logging on each flash device. This means we can operate over very dumb flash chips directly attached to the network, resulting in massive savings in power and infrastructure cost (basically, we don’t need storage servers any more).
  2. From a top-down perspective, a really fast flash-based shared log is great for applications that need strong consistency, such as databases, transactional key-value stores and metadata services. We can run a database at speeds that saturate the raw flash. For some types of strongly consistent operations (like atomic updates), we are able to run at a few hundred thousand operations per second.

The current Corfu implementation has been deployed over a cluster of 32 Intel X25M server-attached SSDs. This deployment currently supports 400K 4KB reads/sec and 200K 4KB appends/sec. Several applications have been prototyped over Corfu, include a transactional key-value store and a full replicated database. While we are still evaluating these applications, the initial results are promising; for instance, our key-value store can support atomic multi-gets and multi-puts involving ten 4KB keys each at speeds of 40K/sec and 20K/sec, respectively.