Synonyms
Backup mechanisms; Fault-tolerance
Definition
Replication is a common mechanism to increase the availability of a data service. The idea is to have several copies of the database, each of them installed on a different site (machine or set of machines). Using replication, the data remains available as long as one site is running and accessible. Fault tolerance is related to availability, and the two terms are often used interchangeably. A system is considered fault tolerant if it continues to work correctly despite the failure of individual components. Replicating data and processes over several sites, the failure of any individual site can be masked since the tasks executed by the failed site can be transferred to one of the available sites. In its strict definition, a fault-tolerant system must behave exactly as a system where components never fail. This requires making failures transparent to clients and typically means that all data copies have to be consistent at all...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Bernstein PA, Goodman N. An algorithm for concurrency control and recovery in replicated distributed databases. ACM Trans Database Syst. 1984;9(4):596–615.
Bernstein PA, Hadzilacos V, Goodman N. Concurrency control and recovery in database systems. Reading: Addison Wesley; 1987.
Budhiraja N, Marzullo K, Schneider FB, Toueg S. The primary-backup approach. In: Mullender S, editor. Distributed systems. 2nd ed. Harlow/Munich: Addison Wesley; 1993. p. 199–216.
Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh WC, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D. Spanner: Google’s globally distributed database. ACM Trans Comput Syst. 2013;31(3):8
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W. Dynamo: Amazon’s highly available key-value store. In: Proceedings of the 21st ACM Symposium on Operating System Principles; 2007. p. 205–20
Ghemawat S, Gobioff H, Leung S. The google file system. In: Proceedings of the 19th ACM Symposium on Operating System Principles; 2003. p. 29–43
Gilbert S, Lynch NA. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News. 2002;33(2): 51–9.
Gray J, Helland P, O’Neil P, Shasha D. The dangers of replication and a solution. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 173–82.
Hunt P, Konar M, Junqueira FP, Reed B. Zookeeper: wait-free coordination for internet-scale systems. In: Proceedings of the USENIX 2010 Annual Technical Conference; 2010.
Jiménez-Peris R, Patiño-Martínez M, Alonso G, Kemme B. Are quorums an alternative for data replication? ACM Trans Database Syst. 2003;28(3):257–94.
Kemme B, Bartoli A, Babaoglu Ö. Online reconfiguration in replicated databases based on group communication. In: Proceedings of the International Conference on Dependable Systems and Networks; 2001. p. 117–30.
Lakshman A, Malik P. Cassandra: a decentralized structured storage system. Oper Syst Rev. 2010;44(2):35–40.
Lamport L. The part-time parliament. ACM Trans Comput Syst. 1998;16(2):133–69.
Rao J, Shekita EJ, Tata S. Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow. 2011;4(4):243–54.
Satyanarayanan M, Kistler JJ, Kumar P, Okasaki ME, Siegel EH, Steere DC. Coda: a highly available file system for a distributed workstation environment. IEEE Trans Comput. 1990;39(4):447–59.
Terry DB, Theimer M, Petersen K, Demers AJ, Spreitzer M, Hauser C. Managing update conflicts in Bayou, a weakly connected replicated storage system. In: Proceedings of the 15th ACM Symposium on Operating System Principles; 1995. p. 172–83.
Thomas RH. A majority consensus approach to concurrency control for multiple copy databases. ACM Trans Database Syst. 1979;4(2): 180–209.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Kemme, B. (2018). Replication for Availability and Fault Tolerance. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80723
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80723
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering