Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Data Replication and Encoding

  • Behrooz ParhamiEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_174-1


Definition of Entry

Conventional or well-established redundancy methods for preventing data loss, unavailability, or corruption can be used to protect big data, but they need to be updated in order to make them efficiently applicable to large data sets.


Data stored in memory devices, in storage networks, on the Web, or in the Cloud must be protected against loss, accidental contamination, or deliberate adulteration. Data are valuable assets that can be lost to negligence or theft (for illicit use or to exchange for ransom). Over the years, many methods of data protection have been devised by researchers in the field of dependable and fault-tolerant computing (Jalote 1994; Parhami 2018), all of which entail introducing redundancy to make data robust and recoverable in the event of loss or corruption. As data assumes ever-more important roles in the proper functioning of systems that affect our daily lives, greater...

This is a preview of subscription content, log in to check access.


  1. Arazi B (1987) A commonsense approach to the theory of error-correcting codes. MIT Press, Cambridge, MAzbMATHGoogle Scholar
  2. Armburst M et al (2010) A view of cloud computing. Commun ACM 53(4):50–58CrossRefGoogle Scholar
  3. Avizienis A (1971) Arithmetic error codes: cost and effectiveness studies for application in digital system design. IEEE Trans Comput 20(11):1322–1331CrossRefGoogle Scholar
  4. Avizienis A (1973) Arithmetic algorithms for error-coded operands. IEEE Trans Comput 22(6):567–572CrossRefGoogle Scholar
  5. Beimel A (2011) Secret-sharing schemes: a survey. In: Proceedings of international conference coding and cryptology, Springer LNCS no. 6639, Berlin, pp 11–46Google Scholar
  6. Benedetto S, Montorsi G (1996) Unveiling turbo codes: some results on parallel concatenated coding schemes. IEEE Trans Inf Theory 42(2):409–428CrossRefGoogle Scholar
  7. Berger JM (1961) A note on error detection codes for asymmetric channels. Inf Control 4:68–73MathSciNetCrossRefGoogle Scholar
  8. Budhiraja N, Marzullo K, Schneider FB, Toueg S (1993) The primary-backup approach. Distrib Syst 2:199–216Google Scholar
  9. Caulfield AM, et al (2016) A cloud-scale acceleration architecture. In: Proceedings of 49th IEEE/ACM international symposium on microarchitecture, Taipei, Taiwan, pp 1–13Google Scholar
  10. Chen CLP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347CrossRefGoogle Scholar
  11. Chen PM, Lee EK, Gibson GA, Katz RH, Patterson DA (1994) RAID: high-performance reliable secondary storage. ACM Comput Surv 26(2):145–185CrossRefGoogle Scholar
  12. Dimakis AG, Ramachandran K, Wu Y, Suh C (2011) A survey on network codes for distributed storage. Proc IEEE 99(3):476–489CrossRefGoogle Scholar
  13. Dullmann D et al (2001) Models for replica synchronization and consistency in a data grid. In: Proceedings of 10th IEEE international symposium on high performance distributed computing, San Francisco, CA, pp 67–75Google Scholar
  14. Feng G-L, Deng RH, Bao F, Shen J-C (2005) New efficient MDS array codes for RAID – part I: Reed-Solomon-like codes for tolerating three disk failures. IEEE Trans Comput 54(9):1071–1080; Part II: Rabin-like codes for tolerating multiple (≥ 4) disk failures. IEEE Trans Comput 54(12):1473–1483CrossRefGoogle Scholar
  15. Gallager R (1962) Low-density parity-check codes. IRE Trans Inf Theory 8(1):21–28MathSciNetCrossRefGoogle Scholar
  16. Garner HL (1966) Error codes for arithmetic operations. IEEE Trans Electron Comput 5:763–770CrossRefGoogle Scholar
  17. Garrett P (2004) The mathematics of coding theory. Prentice Hall, Upper Saddle RiverzbMATHGoogle Scholar
  18. Guerraoui R, Schiper A (1997) Software-based replication for fault tolerance. IEEE Comput 30(4):68–74CrossRefGoogle Scholar
  19. Guruswami V, Rudra A (2009) Error correction up to the information-theoretic limit. Commun ACM 52(3):87–95CrossRefGoogle Scholar
  20. Hamming RW (1950) Error detecting and error correcting codes. Bell Labs Tech J 29(2):147–160MathSciNetCrossRefGoogle Scholar
  21. Hankerson R et al (2000) Coding theory and cryptography: the essentials. Marcel Dekker, New YorkGoogle Scholar
  22. Hilbert M, Gomez P (2011) The World’s technological capacity to store, communicate, and compute information. Science 332:60–65CrossRefGoogle Scholar
  23. Hu H, Wen Y, Chua T-S, Li X (2014) Toward scalable systems for big data analytics; a technology tutorial. IEEE Access 2:652–687CrossRefGoogle Scholar
  24. Iyengar A, Cahn R, Garay JA, Jutla C (1998) Design and implementation of a secure distributed data repository. IBM Thomas J. Watson Research Division, Yorktown HeightsGoogle Scholar
  25. Jalote P (1994) Fault tolerance in distributed systems. Prentice Hall, Englewood CliffsGoogle Scholar
  26. Knuth DE (1986) Efficient balanced codes. IEEE Trans Inf Theory 32(1):51–53MathSciNetCrossRefGoogle Scholar
  27. Lin S, Costello DJ (2004) Error control coding, vol 2. Prentice Hall, Upper Saddle RiverzbMATHGoogle Scholar
  28. McAuley AJ (1994) Weighted sum codes for error detection and their comparison with existing codes. IEEE/ACM Trans Networking 2(1):16–22CrossRefGoogle Scholar
  29. Parhami B (2018) Dependable computing: a multi-level approach. Draft of book manuscript, available on-line at: http://www.ece.ucsb.edu/~parhami/text_dep_comp.htm
  30. Parhami B, Avizienis A (1973) Detection of storage errors in mass memories using arithmetic error codes. IEEE Trans Comput 27(4):302–308MathSciNetzbMATHGoogle Scholar
  31. Petascale Data Storage Institute (2012) Analyzing failure data. Project Web site: http://www.pdl.cmu.edu/PDSI/FailureData/index.html
  32. Peterson WW, Brown DT (1961) Cyclic codes for error detection. Proc IRE 49(1):228–235MathSciNetCrossRefGoogle Scholar
  33. Peterson WW, Weldon EJ Jr (1972) Error-correcting codes, 2nd edn. MIT Press, Cambridge, MAGoogle Scholar
  34. Pless V (1998) Bose-Chaudhuri-Hocquenghem (BCH) codes. In: Introduction to the theory of error-correcting codes, 3rd edn. Wiley, New York, pp 109–222Google Scholar
  35. Rabin M (1989) Efficient dispersal of information for security, load balancing, and fault tolerance. J ACM 36(2):335–348MathSciNetCrossRefGoogle Scholar
  36. Rao TRN, Fujiwara E (1989) Error-control coding for computer systems. Prentice Hall, Upper Saddle River, NJGoogle Scholar
  37. Reed I, Solomon G (1960) Polynomial codes over certain finite fields. SIAM J Appl Math 8:300–304MathSciNetCrossRefGoogle Scholar
  38. Schroeder B, Gibson GA (2007) Understanding disk failure rates: what does an MTTF of 1,000,000 hours mean to you? ACM Trans Storage 3(3):Article 8, 31 ppCrossRefGoogle Scholar
  39. Sklar B, Harris FJ (2004) The ABCs of linear block codes. IEEE Signal Process 21(4):14–35CrossRefGoogle Scholar
  40. Stanford University (2012) 21st century computer architecture: a community white paper. On-line: http://csl.stanford.edu/~christos/publications/2012.21stcenturyarchitecture.whitepaper.pdf
  41. Wakerly JF (1978) Error detecting codes, self-checking circuits and applications. North Holland, New YorkzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringUniversity of CaliforniaSanta BarbaraUSA

Section editors and affiliations

  • Bingsheng He
  • Behrooz Parhami
    • 1
  1. 1.Dept. of Electrical and Computer EngineeringUniversity of CaliforniaSanta BarbaraUnited States