Advertisement

RAID-6Plus: A Fast and Reliable Coding Scheme Aided by Multi-failure Degradation

  • Ming-Zhu DengEmail author
  • Yang Ou
  • Nong Xiao
  • Song-Ping Yu
  • Wei Chen
  • Zhi-Guang Chen
  • Fang Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9464)

Abstract

Existing triple-failure-tolerant codes assume that failures are independent and instantaneous. Such assumptions overlook the underlying mechanism of multi-failure occurrences and ignored the effect of reconstruction window. These codes are not adapted to the occurrence pattern of failure in real-world applications. As a result, the third parity drive is almost idle as it set to handle the triple-failure scenario only with lower-level failure situations unattended. Furthermore, the problem of single failure rebuild deteriorates with the increasing disk capacity, and the system’s reliability will decrease with user experience impaired. Aiming at these problems, a fast reconstructable coding scheme extended from RAID-6 has been developed in this study. RAID-6Plus maintains a smaller reconstruction window by recoding the third parity drive. Existing codes provide absolute reliability for triple failures via full combinations. As a contrast, RAID-6Plus employs short combinations which are able to greatly reuse overlapped elements during reconstruction to remake the third parity drive. The short combinations shorten the reconstruction window of single failure, which avoids multi-failure overlapping in the reconstruction window. The capability of multi-failure degradation provides RAID-6Plus with (1) a better system performance comparing to RTP and STAR and (2) an enhanced reliability comparing to RAID-6.

Keywords

Reconstruction window Failure mode Multi-failure degradation Flexible reliability 

Notes

Acknowledgment

We are grateful to our anonymous reviewers for their suggestions to improve this paper. This work is supported by the National Natural Science Foundation of China under Grant Nos. 61232003, 61332003, 61202121, 61402503, 61303073.

References

  1. 1.
    Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57, 889–901 (2008)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Goel, A., Corbett, P.: RAID triple parity. ACM SIGOPS Oper. Syst. Rev. 46, 41–49 (2012)CrossRefGoogle Scholar
  3. 3.
    Blaum, M., Bruck, J., Vardy, A.: MDS array codes with independent parity symbols. IEEE Trans. Inf. Theor. 42, 529–542 (1996)CrossRefzbMATHGoogle Scholar
  4. 4.
    Jain, N., Dahlin, M., Tewari, R.: TAPER: tiered approach for eliminating redundancy in replica synchronization. In: FAST, pp. 21–21Google Scholar
  5. 5.
    Chen, P.M., Lee, E.K., Gibson, G.A., Katz, R.H., Patterson, D.A.: RAID: high-performance, reliable secondary storage. ACM Comput. Surv. (CSUR) 26, 145–185 (1994)CrossRefGoogle Scholar
  6. 6.
    Amer, A., Long, D.D., Thomas Schwarz, S.: Reliability challenges for storing exabytes. In: 2014 International Conference on Computing, Networking and Communications (ICNC), pp. 907–913. IEEE (2014)Google Scholar
  7. 7.
    Schroeder, B., Gibson, G.A.: Disk failures in the real world: what does an MTTF of 1, 000, 000 hours mean to you? In: FAST, pp. 1–16Google Scholar
  8. 8.
    Plank, J.S., Blaum, M.: Sector-disk (SD) erasure codes for mixed failure modes in RAID systems. ACM Trans. Storage (TOS) 10, 4 (2014)Google Scholar
  9. 9.
    Leventhal, A.: Triple-parity RAID and beyond. Queue 7, 30 (2009)Google Scholar
  10. 10.
    Xiang, L., Xu, Y., Lui, J., Chang, Q.: Optimal recovery of single disk failure in RDP code storage systems. ACM SIGMETRICS Perform. Eval. Rev. 38, 119–130 (2010)CrossRefGoogle Scholar
  11. 11.
    Xiang, L., Xu, Y., Lui, J., Chang, Q., Pan, Y., Li, R.: A hybrid approach to failed disk recovery using RAID-6 codes: algorithms and performance evaluation. ACM Trans. Storage (TOS) 7, 11 (2011)Google Scholar
  12. 12.
    Zhu, Y., Lee, P.P., Xiang, L., Xu, Y., Gao, L.: A cost-based heterogeneous recovery scheme for distributed storage systems with RAID-6 codes, pp. 1–12. IEEEGoogle Scholar
  13. 13.
    Khan, O., Burns, R.C., Plank, J.S., Pierce, W., Huang, C.: Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads, p. 20Google Scholar
  14. 14.
    Ma, A., Douglis, F., Lu, G., Sawyer, D., Chandra, S., Hsu, W.: RAIDShield: characterizing, monitoring, and proactively protecting against disk failures. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies, pp. 241–256. USENIX Association (2015)Google Scholar
  15. 15.
    Mingyuan, X., Mohit, S., Mario, B., David, A.P.: A tale of two erasure codes in HDFS. In: FAST, pp. 213–226 (2015)Google Scholar
  16. 16.
    Pinheiro, E., Weber, W.-D., Barroso, L.A.: Failure trends in a large disk drive population. In: FAST, pp. 17–23Google Scholar
  17. 17.
    Luo, X., Shu, J.: Load-balanced recovery schemes for single-disk failure in storage systems with any erasure code. In: 2013 42nd International Conference on Parallel Processing (ICPP), pp. 552–561. IEEE (2013)Google Scholar
  18. 18.
    Boboila, S., Desnoyers, P.: Write endurance in flash drives: measurements and analysis, pp. 9–9Google Scholar
  19. 19.
    Elerath, J.G., Schindler, J.: Beyond MTTDL: a closed-form RAID 6 reliability equation. ACM Trans. Storage (TOS) 10, 7 (2014)Google Scholar
  20. 20.
    Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., Sankar, S.: Row-diagonal parity for double disk failure correction. In: Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pp. 1–14Google Scholar
  21. 21.
    Rongdong, H., Guangming, L., Jingfei, J.: An efficient coding scheme for tolerating double disk failures. In: 2010 12th IEEE International Conference on High Performance Computing and Communications (HPCC), pp. 707–712 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ming-Zhu Deng
    • 1
    Email author
  • Yang Ou
    • 1
  • Nong Xiao
    • 1
  • Song-Ping Yu
    • 1
  • Wei Chen
    • 1
  • Zhi-Guang Chen
    • 1
  • Fang Liu
    • 1
  1. 1.State Key Laboratory of High Performance Computing, College of ComputerNational University of Defense TechnologyChangshaChina

Personalised recommendations