John D. Davis, Karin Strauss, Parikshit Gopalan, Mark Manasse, and Sergey Yekhanin
Zombie is an endurance management framework that enables a variety of error correction mechanisms to extend thelifetimes of memories that suffer from bit failures caused by wearout, such as phase-change memory (PCM). Zombie supports both single-level cell (SLC) and multi-level cell (MLC) variants. It extends the lifetime of blocks in working memory pages (primary blocks) by pairing them with spare blocks, i.e., working blocks in pages that have been disabled due to exhaustion of a single block's error correction resources, which would be `dead' otherwise. Spare blocks adaptively provide error correction resources to primary blocks as failures accumulate over time. This reduces the waste caused by early block failures, making working blocks in discarded pages a useful resource. Even though we use PCM as the target technology, Zombie applies to any memory technology that suffers stuck-at cell failures.
This paper provides supplemental information to , which describes the name framework, a combination of two new error correction mechanisms (ZombieXOR for SLC and ZombieMLC for MLC) and the extension of two previously proposed SLC mechanisms (ZombieECP and ZombieERC). We present the read and write algorithms, an analytical model for SLC, detailed discussion of the MLC mechanism, and analysis demonstrating reduced drift-induced soft errors for MLC. This additional information could not fit in the page limitations of , demonstrates feasibility of PCM, especially MLC, and supports our results of 58% to 92% improvement in endurance for ZombieSLC memory and an even more impressive 11X to 17X improvement for ZombieMLC, both with performance overheads of only 0.1% when memories using prior error correction mechanisms reach end of life.
|Publisher||Microsoft Technical Report|