Replication for Availability and Fault Tolerance

Kemme, Bettina

doi:10.1007/978-1-4614-8265-9_80723

Replication for Availability and Fault Tolerance

Bettina Kemme³

Reference work entry
First Online: 01 January 2018

191 Accesses
3 Altmetric

Synonyms

Backup mechanisms; Fault-tolerance

Definition

Replication is a common mechanism to increase the availability of a data service. The idea is to have several copies of the database, each of them installed on a different site (machine or set of machines). Using replication, the data remains available as long as one site is running and accessible. Fault tolerance is related to availability, and the two terms are often used interchangeably. A system is considered fault tolerant if it continues to work correctly despite the failure of individual components. Replicating data and processes over several sites, the failure of any individual site can be masked since the tasks executed by the failed site can be transferred to one of the available sites. In its strict definition, a fault-tolerant system must behave exactly as a system where components never fail. This requires making failures transparent to clients and typically means that all data copies have to be consistent at all...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

Bernstein PA, Goodman N. An algorithm for concurrency control and recovery in replicated distributed databases. ACM Trans Database Syst. 1984;9(4):596–615.
Article MathSciNet Google Scholar
Bernstein PA, Hadzilacos V, Goodman N. Concurrency control and recovery in database systems. Reading: Addison Wesley; 1987.
Google Scholar
Budhiraja N, Marzullo K, Schneider FB, Toueg S. The primary-backup approach. In: Mullender S, editor. Distributed systems. 2nd ed. Harlow/Munich: Addison Wesley; 1993. p. 199–216.
Google Scholar
Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh WC, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D. Spanner: Google’s globally distributed database. ACM Trans Comput Syst. 2013;31(3):8
Article Google Scholar
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W. Dynamo: Amazon’s highly available key-value store. In: Proceedings of the 21st ACM Symposium on Operating System Principles; 2007. p. 205–20
Google Scholar
Ghemawat S, Gobioff H, Leung S. The google file system. In: Proceedings of the 19th ACM Symposium on Operating System Principles; 2003. p. 29–43
Google Scholar
Gilbert S, Lynch NA. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News. 2002;33(2): 51–9.
Article Google Scholar
Gray J, Helland P, O’Neil P, Shasha D. The dangers of replication and a solution. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 173–82.
Google Scholar
Hunt P, Konar M, Junqueira FP, Reed B. Zookeeper: wait-free coordination for internet-scale systems. In: Proceedings of the USENIX 2010 Annual Technical Conference; 2010.
Google Scholar
Jiménez-Peris R, Patiño-Martínez M, Alonso G, Kemme B. Are quorums an alternative for data replication? ACM Trans Database Syst. 2003;28(3):257–94.
Article Google Scholar
Kemme B, Bartoli A, Babaoglu Ö. Online reconfiguration in replicated databases based on group communication. In: Proceedings of the International Conference on Dependable Systems and Networks; 2001. p. 117–30.
Google Scholar
Lakshman A, Malik P. Cassandra: a decentralized structured storage system. Oper Syst Rev. 2010;44(2):35–40.
Article Google Scholar
Lamport L. The part-time parliament. ACM Trans Comput Syst. 1998;16(2):133–69.
Article Google Scholar
Rao J, Shekita EJ, Tata S. Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow. 2011;4(4):243–54.
Article Google Scholar
Satyanarayanan M, Kistler JJ, Kumar P, Okasaki ME, Siegel EH, Steere DC. Coda: a highly available file system for a distributed workstation environment. IEEE Trans Comput. 1990;39(4):447–59.
Article Google Scholar
Terry DB, Theimer M, Petersen K, Demers AJ, Spreitzer M, Hauser C. Managing update conflicts in Bayou, a weakly connected replicated storage system. In: Proceedings of the 15th ACM Symposium on Operating System Principles; 1995. p. 172–83.
Google Scholar
Thomas RH. A majority consensus approach to concurrency control for multiple copy databases. ACM Trans Database Syst. 1979;4(2): 180–209.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, McGill University, Montreal, QC, Canada
Bettina Kemme

Authors

Bettina Kemme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bettina Kemme .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Kemme, B. (2018). Replication for Availability and Fault Tolerance. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80723

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_80723
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics