Skip to main content

Replication for Availability and Fault Tolerance

  • Reference work entry
  • First Online:

Synonyms

Backup mechanisms; Fault-tolerance

Definition

Replication is a common mechanism to increase the availability of a data service. The idea is to have several copies of the database, each of them installed on a different site (machine or set of machines). Using replication, the data remains available as long as one site is running and accessible. Fault tolerance is related to availability, and the two terms are often used interchangeably. A system is considered fault tolerant if it continues to work correctly despite the failure of individual components. Replicating data and processes over several sites, the failure of any individual site can be masked since the tasks executed by the failed site can be transferred to one of the available sites. In its strict definition, a fault-tolerant system must behave exactly as a system where components never fail. This requires making failures transparent to clients and typically means that all data copies have to be consistent at all...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Bernstein PA, Goodman N. An algorithm for concurrency control and recovery in replicated distributed databases. ACM Trans Database Syst. 1984;9(4):596–615.

    Article  MathSciNet  Google Scholar 

  2. Bernstein PA, Hadzilacos V, Goodman N. Concurrency control and recovery in database systems. Reading: Addison Wesley; 1987.

    Google Scholar 

  3. Budhiraja N, Marzullo K, Schneider FB, Toueg S. The primary-backup approach. In: Mullender S, editor. Distributed systems. 2nd ed. Harlow/Munich: Addison Wesley; 1993. p. 199–216.

    Google Scholar 

  4. Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh WC, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D. Spanner: Google’s globally distributed database. ACM Trans Comput Syst. 2013;31(3):8

    Article  Google Scholar 

  5. DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W. Dynamo: Amazon’s highly available key-value store. In: Proceedings of the 21st ACM Symposium on Operating System Principles; 2007. p. 205–20

    Google Scholar 

  6. Ghemawat S, Gobioff H, Leung S. The google file system. In: Proceedings of the 19th ACM Symposium on Operating System Principles; 2003. p. 29–43

    Google Scholar 

  7. Gilbert S, Lynch NA. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News. 2002;33(2): 51–9.

    Article  Google Scholar 

  8. Gray J, Helland P, O’Neil P, Shasha D. The dangers of replication and a solution. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 173–82.

    Google Scholar 

  9. Hunt P, Konar M, Junqueira FP, Reed B. Zookeeper: wait-free coordination for internet-scale systems. In: Proceedings of the USENIX 2010 Annual Technical Conference; 2010.

    Google Scholar 

  10. Jiménez-Peris R, Patiño-Martínez M, Alonso G, Kemme B. Are quorums an alternative for data replication? ACM Trans Database Syst. 2003;28(3):257–94.

    Article  Google Scholar 

  11. Kemme B, Bartoli A, Babaoglu Ö. Online reconfiguration in replicated databases based on group communication. In: Proceedings of the International Conference on Dependable Systems and Networks; 2001. p. 117–30.

    Google Scholar 

  12. Lakshman A, Malik P. Cassandra: a decentralized structured storage system. Oper Syst Rev. 2010;44(2):35–40.

    Article  Google Scholar 

  13. Lamport L. The part-time parliament. ACM Trans Comput Syst. 1998;16(2):133–69.

    Article  Google Scholar 

  14. Rao J, Shekita EJ, Tata S. Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow. 2011;4(4):243–54.

    Article  Google Scholar 

  15. Satyanarayanan M, Kistler JJ, Kumar P, Okasaki ME, Siegel EH, Steere DC. Coda: a highly available file system for a distributed workstation environment. IEEE Trans Comput. 1990;39(4):447–59.

    Article  Google Scholar 

  16. Terry DB, Theimer M, Petersen K, Demers AJ, Spreitzer M, Hauser C. Managing update conflicts in Bayou, a weakly connected replicated storage system. In: Proceedings of the 15th ACM Symposium on Operating System Principles; 1995. p. 172–83.

    Google Scholar 

  17. Thomas RH. A majority consensus approach to concurrency control for multiple copy databases. ACM Trans Database Syst. 1979;4(2): 180–209.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bettina Kemme .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Kemme, B. (2018). Replication for Availability and Fault Tolerance. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80723

Download citation

Publish with us

Policies and ethics