CRSS publication: Evaluating the Impact of Irrecoverable Read Errors on Disk Array Reliability

Evaluating the Impact of Irrecoverable Read Errors on Disk Array Reliability

Appeared in Proceedings of the IEEE 15th Paciﬁc Rim International Symposium on Dependable Computing (PRDC09).

Abstract

We investigate the impact of irrecoverable read errors — also known as bad blocks — on the MTTDL of mirrored disks, RAID level 5 arrays and RAID level 6 arrays. Our study is based on the data collected by Bairavasundaram et al. from a population of 1.53 million disks over a period of 32 months. Our study indicates that irrecoverable read errors can reduce the mean time to data loss (MTTDL) of the three arrays by up to 99 percent, effectively canceling most of the benefits of fast disk repairs. It also shows the benefits of frequent scrubbing scans that map out bad blocks thus preventing future irrecoverable read errors. As an example, once-a-month scrubbing scans were found to improve the MTTDL of the three arrays by at least 300 percent compared to once-a-year scrubbing scans.

Publication date:
November 2009

Authors:
Jehan-François Pâris
Ahmed Amer
Darrell D. E. Long
Thomas Schwarz

Projects:
Reliable Storage

Available media

Full paper text: PDF

Bibtex entry

@inproceedings{paris-prdc09,
  author       = {Jehan-François Pâris and Ahmed Amer and Darrell D. E. Long and Thomas Schwarz},
  title        = {Evaluating the Impact of  Irrecoverable Read Errors on Disk Array Reliability},
  booktitle    = {Proceedings of the IEEE 15th Paciﬁc Rim International  Symposium on Dependable Computing (PRDC09)},
  month        = nov,
  year         = {2009},
}

Last modified 28 May 2019