The goal of the Flashlight project at MSR Silicon Valley is to explore existing and new flash architectures and to build tools to aid in that endeavor.
CORFU (Clusters of Raw/Redundant Flash Units): Corfu organizes a cluster of flash devices into a single, coherent drive accessed by clients over the network. Each flash device in the cluster is a custom unit of low-power, low-cost hardware that allows raw flash to be attached directly to the network. Corfu uses client-side logic (a new variant of Paxos) to implement the abstraction of a single, cluster-scale drive.
The primary interface exposed by the Corfu drive to applications is a shared, globally-ordered log, which allows multiple clients to concurrently access the drive at high speeds without sacrificing consistency. Corfu supports other interfaces as well, including a linear address space; accordingly, multiple Corfu instances can be used to support a pool of non-shared, mountable volumes. Corfu's distributed nature allows flash bandwidth, capacity and write cycles to be incrementally scaled and shared across multiple clients. A single Corfu drive provides write throughput of up to 1M writes/sec, while read throughput scales linearly with the number of flash devices.
SSD Performance: We extend the popular trace-driven disk simulator DiskSim from CMU by introducing SSD (solid state disk) simulation module. We then exercise our simulator under various real workload traces. This simulation allows us to explore how about how real I/O systems (for example those that support transaction processing systems) will perform when using SSDs rather than disks. We explore the performance of several potential organizations of flash chips and to test the efficacy of various cleaning and wear-leveling algorithms under real workloads. Our initial focus is on server-side workloads.
Flash Research Platform: We are building a flexible platform for solid state storage research by integrating FPGAs, DRAM, and Flash devices. The design leverages reconfigurable hardware to provide maximum flexibility for innovative architectural and algorithmic design of the next generation storage systems.
TxFlash: Traditional storage devices export block-based APIs. Supporting atomic multiple-page writes is desirable, but often comes with non-negligible performance penalties. We observe that such penalties might be significantly reduced for flash memory used in SSDs due to its specific properties such as non-overwrite page writes and fast random reads. In TxFlash, we develop a set of protocols for SSDs to support multiple-page writes with ACID properties, explore their performance characteristics, and assess the implications of such an API on higher-level applications such as file systems.
SSD Lifetime:Two trends can derail the adoption of Solid State Devices (SSDs) as a primary storage device: first, general purpose workloads are harder than mobile applications on flash; second, increasing flash densities result in decreased block erase cycles. This combination of stressful workload and fewer erase cycles can significantly reduce SSD lifetime. We propose a hybrid storage device that uses a hard disk drive (HDD) as a write cache for an SSD. Our design is motivated by two observations: First, HDDs can match the sequential write bandwidth of mid-range SSDs. Second, both server and desktop workloads contain a significant fraction of block overwrites. By maintaining a log-structured HDD cache and migrating cached data periodically, our hybrid design reduces writes to the SSD while retaining its excellent performance. We evaluated our system using a variety of I/O traces from Windows and find that it extends SSD lifetime by 2 times and reduces average I/O latency by 42%.
SSD Reliability: Redundancy schemes such as RAID-5 are highly susceptible to correlated failures when used on SSDs: when an old device fails, it's highly probable that some data is not recoverable from the remaining devices. This data loss occurs due to the fact that SSDs wear out and exhibit higher Bit Error Rates (BERs) as they receive more writes. Since conventional RAID schemes balance writes evenly across devices, they wear SSDs out at similar rates. Intuitively, such solutions attempt to protect data on aging devices by storing them on other, equally old devices. We propose Diff-RAID, a parity-based redundancy solution that creates an age differential in an array of SSDs. Diff-RAID distributes parity blocks unevenly across the array, leveraging their higher update rate to age devices at different rates. Diff-RAID is more reliable than RAID-5 for different flash chips by one or more orders of magnitude, and offers a smooth trade-off between reliability and throughput.
- Michael Wei, John D. Davis, Ted Wobber, Mahesh Balakrishnan, and Dahlia Malkhi, Beyond Block I/O: Implementing a Distributed Shared Log in Hardware, in SYSTOR 2013 (the 6th International Systems and Storage Conference), ACM, 2013
- Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, and John Davis, CORFU: A Shared Log Design for Flash Clusters, in 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI '12), USENIX, April 2012
- Dahlia Malkhi, Mahesh Balakrishnan, John Davis, Vijayan Prabhakaran, and Ted Wobber, From Paxos to CORFU: A Flash-Speed Shared Log, in ACM SIGOPS Operating Systems Reviews, ACM SIGOPS, 2012
- Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, and Ted Wobber, Going beyond Paxos, no. MSR-TR-2011-86, July 2011
- Mahesh Balakrishnan, Phil Bernstein, Dahlia Malkhi, Vijayan Prabhakaran, and Colin Reid, Brief Announcement: Flash-Log -- A High Throughput Log, in 24th International Symposium on Distributed Computing (DISC 2010), Springer Verlag, September 2010
- Vijayan Prabhakaran, Mahesh Balakrishnan, John D. Davis, and Ted Wobber, Depletable Storage Systems, in 2nd Workshop on Hot Topics in Storage and File Systems, USENIX, 22 June 2010
- Mahesh Balakrishnan, Asim Kadav, Vijayan Prabhakaran, and Dahlia Malkhi, Differential RAID: Rethinking RAID for SSD Reliability, in Fifth European Conference on Computer Systems (EuroSys 2010), Association for Computing Machinery, Inc., April 2010
- Gokul Soundararajan, Vijayan Prabhakaran, Mahesh Balakrishnan, and Ted Wobber, Extending SSD Lifetimes with Disk-Based Write Caches, in FAST 2010: 8th USENIX Conference on File and Storage Technologies, USENIX, February 2010
- Asim Kadav, Mahesh Balakrishnan, Vijayan Prabhakaran, and Dahlia Malkhi, Differential RAID: Rethinking RAID for SSD Reliability, in HotStorage 2009: 1st Workshop on Hot Topics in Storage and File Systems, Association for Computing Machinery, Inc., October 2009
- Abhishek Rajimwale, Vijayan Prabhakaran, and John D. Davis, Block Management in Solid-State Devices, in Proceedings of the USENIX Annual Technical Conference (USENIX'09), USENIX, June 2009
- John D. Davis and Lintao Zhang, FRP: a Nonvolatile Memory Research Platform Targeting NAND Flash, in The First Workshop on Integrating Solid-state Memory into the Storage Hierarchy, Held in Conjunction with ASPLOS 2009, Association for Computing Machinery, Inc., March 2009
- Vijayan Prabhakaran, Thomas L. Rodeheffer, and Lidong Zhou, Transactional Flash, in Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’08), USENIX, December 2008
- Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy, Design Tradeoffs for SSD Performance, in Proceedings of the 2008 USENIX Technical Conference (USENIX'08), USENIX, June 2008
- Andrew Birrell, Michael Isard, Chuck Thacker, and Ted Wobber, A Design for High-Performance Flash Disks, in Operating Systems Review, vol. 41, no. 2, pp. 88-93, Association for Computing Machinery, Inc., April 2007