Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Online Recovery in Parallel Database Systems

  • Ricardo Jiménez-PerisEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1089


Continuous availability; High availability; 24×7 operation


Replication (also known as clustering) is a technique to provide high availability in parallel and distributed databases. High availability aims to provide continuous service operation. High availability has two faces. On one hand, it provides fault-tolerance by introducing redundancy in the form of replication, that is, having multiple copies or replicas of the data at different sites. On the other hand, since sites holding the replicas may crash and/or fail, in order to keep a given degree of availability, failed or new replicas should be reintroduced into the system. Introducing new replicas requires transferring to them the current state in a consistent fashion (known as recovery). A simple solution to this problem is offline recovery, that is, in order to obtain a quiescent state, request processing is suspended, then the state is transferred from a working replica (termed recoverer replica) to the new...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Bernstein PA, Hadzilacos V, Goodman N. Concurrency control and recovery in database systems. Reading: Addison Wesley; 1987.Google Scholar
  2. 2.
    Castro M, Liskov B. Practical byzantine fault tolerance and proactive recovery. ACM Trans Comput Syst. 2002;20(4):398–461.CrossRefGoogle Scholar
  3. 3.
    Gançarski S, Naacke H, Pacitti E, Valduriez P. The leganet system: freshness-aware transaction routing in a database cluster. Inform Syst. 2007;32(2):320–43.CrossRefGoogle Scholar
  4. 4.
    Gashi I, Popov P, Strigini L. Fault tolerance via diversity for off-the-shelf products: a study with SQL database servers. IEEE Trans Depend Secur Comput. 2007;4(4):280–94.CrossRefGoogle Scholar
  5. 5.
    Jiménez-Peris R, Patiño-Martínez M, Alonso G. Non-intrusive, parallel recovery of replicated data. In: Proceedings of the 21st Symposium on Reliable Distributed Systems; 2002. p. 150–9.Google Scholar
  6. 6.
    Kemme B. and Alonso G. Don’t be lazy, be consistent: Postgres-R, a new way to implement database replication. In: Proceedings of the 26th International Conference on Very Large Data Bases; 2000. p. 134–43.Google Scholar
  7. 7.
    Kemme B, Alonso G. A new approach to developing and implementing eager database replication protocols. ACM Trans Database Syst. 2000;25(3):333–79.CrossRefGoogle Scholar
  8. 8.
    Kemme B, Bartoli A, Babaoglu O. Online reconfiguration in replicated databases based on group communication. In: Proceedings of the International Conference on Dependable Systems and Networks; 2001. p. 117–30.Google Scholar
  9. 9.
    Lau E Madden S. An integrated approach to recovery and high availability in an updatable, distributed data warehouse. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006. p. 703–14.Google Scholar
  10. 10.
    Manassiev K, Amza C. Scaling and continuous availability in database server clusters through multiversion replication. In: Proceedings of the International Conference on Dependable Systems and Networks; 2007. p. 666–76.Google Scholar
  11. 11.
    Özsu MT, Valduriez P. Principles of distributed database systems. 2nd ed. Upper Saddle River: Prentice-Hall; 1999.Google Scholar
  12. 12.
    Pacitti E, Simon E. Update propagation strategies to improve freshness in lazy master replicated databases. VLDB J. 2000;8(3):305–18.CrossRefGoogle Scholar
  13. 13.
    Patiño-Martínez M, Jiménez-Peris R, Kemme B, Alonso G. Middle-R: consistent database replication at the middleware level. ACM Trans Comput Syst. 2005;23(4):375–423.CrossRefGoogle Scholar
  14. 14.
    Pedone F, Guerraoui R, Schiper A. The database state machine approach. Distrib Parallel Databases. 2003;14(1):71–98.CrossRefGoogle Scholar
  15. 15.
    Plattner C, Alonso G. Ganymed: scalable replication for transactional web applications. In: Proceedings of the ACM/IFIP/USENIX 5th International Middleware Conference; 2004. p. 155–74.CrossRefGoogle Scholar
  16. 16.
    PostgreSQL PostgreSQL Point in Time Recovery. http://www.postgresql.org/docs/8.0/interactive/backup-online.html.
  17. 17.
    Vandiver B, Balakrishnan H, Liskov B, Madden S. Tolerating Byzantine faults in database systems using commit barrier scheduling. In: Proceedings of the 21st ACM Symposium on Operating System Principles; 2007. p. 59–72.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Distributed Systems LabUniversidad Politecnica de MadridMadridSpain

Section editors and affiliations

  • Patrick Valduriez
    • 1
  1. 1.INRIALINANantesFrance