Skip to main content

Analysis of Repair Cost in Distributed Storage Systems with Fault-Tolerant Coding Strategies

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9531))

  • 1465 Accesses

Abstract

To achieve reliability in distributed storage systems, fault tolerance techniques like replication strategy are adopted. As the rapid growth of data, distributed storage systems have been transitioning replication strategy to coding strategies like Reed Solomon codes to achieve higher storage efficiency. But the repair cost of Reed Solomon codes in terms of network bandwidth is high. For repair efficiency, a new class of codes called Regenerating Codes are proposed and become more popular. However, how to quantify and evaluate the repair cost of these coding strategies at the system level remains unexplored. In this paper, we propose a metric of the repair cost at the level of whole systems, and then compare the two main classes of codes Reed Solomon codes and Regenerating codes. Our goal is to provide system designers with evaluation methods of the system level repair cost. Thus, system designers can choose optimal coding strategies according to their certain systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dimakis, A.G., Godfrey, P.B., Wu, Y., Wainwright, M.J., Ramchandran, K.: Network coding for distributed storage systems. IEEE Trans. Inf. Theor. 56, 4539–4551 (2010)

    Article  Google Scholar 

  2. Jiekak, S., Kermarrec, A.M., Le Scouarnec, N., Straub, G., Van Kempen, A.: Regenerating codes: A system perspective. ACM SIGOPS Operating Syst. Rev. 47, 23–32 (2013)

    Article  Google Scholar 

  3. Rashmi, K.V., Shah, N.B., Kumar, P.V.: Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Trans. Inf. Theory 57(8), 5227–5239 (2011)

    Article  MathSciNet  Google Scholar 

  4. Wu, Y., Dimakis, A.G.: Reducing repair traffic for erasure coding-based storage via interference alignment. In: IEEE International Symposium on Information Theory ISIT 2009 (2009)

    Google Scholar 

  5. Chun, B.G., Dabek, F., Haeberlen, A., Sit, E., Weatherspoon, H., Kaashoek, M.F., Kubiatowicz, J., Morris, R.: Efficient replica maintenance for distributed storage systems. In: NSDI (2006)

    Google Scholar 

  6. Hu, Y., Chen, H.C.H., Lee, P.P.C., Tang, Y.: NCCloud: applying network coding for the storage repair in a cloud-of-clouds. In: FAST (2012)

    Google Scholar 

  7. Sathiamoorthy, M., et al.: Xoring elephants: novel erasure codes for big data. In: Proceedings of the VLDB Endowment (2013)

    Google Scholar 

  8. Ford, D., Labelle, F., Popovici, F.I., Stokely, M., Truong, V.A., Barroso, L., Grimes, C., Quinlan, S.: Availability in globally distributed storage systems. In: OSDI, pp. 61–74 (2010)

    Google Scholar 

  9. Papailiopoulos, D.S., Dimakis, A.G.: Locally repairable codes. In: 2012 IEEE International Symposium on Information Theory Proceedings (ISIT) (2012)

    Google Scholar 

  10. Birolini, A.: Reliability Engineering, vol. 5. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  11. Gardiner, C.W.: Stochastic Methods. Springer, Heidelberg (1985)

    Google Scholar 

  12. Shum, K.W.: Cooperative regenerating codes for distributed storage systems (2011). arXiv preprint arXiv:1101.5257

  13. Chen, S., Sun, Y., Kozat, U.C., Huang, L., Sinha, P., Liang, G., Liu, X., Shroff, N.B.: When queueing meets coding: Optimal-latency data retrieving scheme in storage clouds. In: INFOCOM (2014)

    Google Scholar 

  14. Gross, D., Harris, C.: Fundamentals of Queueing Theory. Wiley Interscience, New York (1998)

    Google Scholar 

  15. Ramabhadran, S., Pasquale, J.: Analysis of long-running replicated systems. In: INFOCOM, pp. 1–9 (2006)

    Google Scholar 

  16. Li, R., Lin, J., Lee, P.P.C.: Core: Augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems. In: IEEE Mass Storage Systems and Technologies (MSST) (2013)

    Google Scholar 

Download references

Acknowledgments

This research is supported in part by the Major State Basic Research Development Program of China (973 Program, 2012CB315803), the National Natural Science Foundation of China (61371078), and the Research Fund for the Doctoral Program of Higher Education of China (20130002110051).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yanbo Lu or Shu-Tao Xia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Lu, Y., Hao, J., Liu, XJ., Xia, ST. (2015). Analysis of Repair Cost in Distributed Storage Systems with Fault-Tolerant Coding Strategies. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9531. Springer, Cham. https://doi.org/10.1007/978-3-319-27140-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27140-8_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27139-2

  • Online ISBN: 978-3-319-27140-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics