Decoding STAR Code for Tolerating Simultaneous Disk Failure and Silent Errors

Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2010), Chicago, IL, USA |

Published by Institute of Electrical and Electronics Engineers

Publication

As storage systems grow in size and complexity, various hardware and software component failures inevitably occur, resulting in disk malfunction in failures, as well as silent errors. Existing techniques and schemes overcome the failures and silent errors in a separate fashion. In this paper, we advocate using the STAR code as a unified and systematic mechanism to simultaneously tolerate failures on one disk and silent errors on another. By exploring the unique geometric structure of the STAR code, we propose a novel efficient decoding algorithm – EEL. Both theoretical and experimental performance evaluations show that EEL constantly outperforms a naive Try-and-Test approach by large factors in overall decoding throughput.