Advertisement

Effect of Codeword Placement on the Reliability of Erasure Coded Data Storage Systems

  • Vinodh Venkatesan
  • Ilias Iliadis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8054)

Abstract

Modern data storage systems employ advanced erasure codes to protect data from storage node failures because of their ability to provide high data reliability at high storage efficiency. In contrast to previous studies, we consider the practical case where the length of codewords in an erasure coded system is much smaller than the number of storage nodes in the system. In this case, there exists a large number of possible ways in which different codewords can be stored across the nodes of the system. In this paper, it is shown that a declustered placement of codewords can significantly improve system reliability compared to other placement schemes. A detailed reliability analysis is presented that accounts for the rebuild times involved, the amounts of partially rebuilt data when additional nodes fail during rebuild, and an intelligent rebuild process that attempts to rebuild the most critical codewords first.

Keywords

Storage System Data Loss Spread Factor Node Failure Placement Scheme 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Patterson, D.A., Gibson, G., Katz, R.H.: A case for redundant arrays of inexpensive disks (RAID). In: Proc. 1988 ACM SIGMOD Int’l Conference on Management of Data, pp. 109–116 (1988)Google Scholar
  2. 2.
    Chen, P.M., Lee, E.K., Gibson, G.A., Katz, R.H., Patterson, D.A.: RAID: high-performance, reliable secondary storage. ACM Computing Surveys 26(2), 145–185 (1994)CrossRefGoogle Scholar
  3. 3.
    Thomasian, A., Blaum, M.: Higher reliability redundant disk arrays: Organization, operation, and coding. ACM Trans. Storage 5(3), 1–59 (2009)CrossRefGoogle Scholar
  4. 4.
    Leong, D., Dimakis, A.G., Ho, T.: Distributed storage allocation for high reliability. In: Proc. IEEE Int’l Conference on Communications, pp. 1–6 (2010)Google Scholar
  5. 5.
    Leslie, M., Davies, J., Huffman, T.: A comparison of replication strategies for reliable decentralised storage. Journal of Networks 1(6), 36–44 (2006)CrossRefGoogle Scholar
  6. 6.
    Thomasian, A., Blaum, M.: Mirrored disk organization reliability analysis. IEEE Transactions on Computers 55, 1640–1644 (2006)CrossRefGoogle Scholar
  7. 7.
    Li, X., Lillibridge, M., Uysal, M.: Reliability analysis of deduplicated and erasure-coded storage. ACM SIGMETRICS Performance Evaluation Review 38(3), 4–9 (2011)CrossRefGoogle Scholar
  8. 8.
    Xin, Q., Miller, E.L., Schwarz, T.J.E.: Evaluation of distributed recovery in large-scale storage systems. In: Proc. 13th IEEE Int’l Symposium on High Performance Distributed Computing (HPDC 2004), pp. 172–181 (2004)Google Scholar
  9. 9.
    Venkatesan, V., Iliadis, I., Fragouli, C., Urbanke, R.: Reliability of clustered vs. declustered replica placement in data storage systems. In: Proc. 19th Annual IEEE/ACM Int’l Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2011), pp. 307–317 (2011)Google Scholar
  10. 10.
    Venkatesan, V., Iliadis, I., Haas, R.: Reliability of data storage systems under network rebuild bandwidth constraints. In: Proc. 20th Annual IEEE Int’l Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2012), pp. 189–197 (2012)Google Scholar
  11. 11.
    Weatherspoon, H., Kubiatowicz, J.D.: Erasure coding vs. replication: A quantitative comparison. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 328–338. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Plank, J.S., Huang, C.: Tutorial: Erasure coding for storage applications. Slides presented at 11th Usenix Conference on File and Storage Technologies (FAST 2013) (February 2013)Google Scholar
  13. 13.
    Greenan, K.M., Miller, E.L., Wylie, J.: Reliability of flat XOR-based erasure codes on heterogeneous devices. In: Proc. 38th Annual IEEE/IFIP Int’l Conference on Dependable Systems and Networks (DSN 2008), pp. 147–156 (June 2008)Google Scholar
  14. 14.
    Venkatesan, V., Iliadis, I.: A general reliability model for data storage systems. In: Proc. 9th Int’l Conference on Quantitative Evaluation of Systems (QEST 2012), pp. 209–219 (2012)Google Scholar
  15. 15.
    Ford, D., Labelle, F., Popovici, F.I., Stokely, M., Truong, V.A., Barroso, L., Grimes, C., Quinlan, S.: Availability in globally distributed storage systems. In: Proc. 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2010), pp. 61–74 (2010)Google Scholar
  16. 16.
    Ramabhadran, S., Pasquale, J.: Analysis of long-running replicated systems. In: Proc. 25th IEEE Int’l Conference on Computer Communications (INFOCOM 2006), pp. 1–9 (2006)Google Scholar
  17. 17.
    Dimakis, A.G., Ramchandran, K., Wu, Y., Suh, C.: A survey on network coding for distributed storage. Proceedings of the IEEE 99(3) (2011)Google Scholar
  18. 18.
    IBM: XiV Storage System Specifications, http://www.xivstorage.com
  19. 19.
    Venkatesan, V., Iliadis, I.: Effect of codeword placement on the reliability of erasure coded data storage systems. Technical Report RZ 3827, IBM Research - Zurich (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Vinodh Venkatesan
    • 1
  • Ilias Iliadis
    • 1
  1. 1.IBM Research – ZurichRüschlikonSwitzerland

Personalised recommendations