Online Recovery in Parallel Database Systems
Continuous availability; High availability; 24×7 operation
Replication (also known as clustering) is a technique to provide high availability in parallel and distributed databases. High availability aims to provide continuous service operation. High availability has two faces. On one hand, it provides fault-tolerance by introducing redundancy in the form of replication, that is, having multiple copies or replicas of the data at different sites. On the other hand, since sites holding the replicas may crash and/or fail, in order to keep a given degree of availability, failed or new replicas should be reintroduced into the system. Introducing new replicas requires transferring to them the current state in a consistent fashion (known as recovery). A simple solution to this problem is offline recovery, that is, in order to obtain a quiescent state, request processing is suspended, then the state is transferred from a working replica (termed recoverer replica) to the new...
- 1.Bernstein PA, Hadzilacos V, Goodman N. Concurrency control and recovery in database systems. Reading: Addison Wesley; 1987.Google Scholar
- 5.Jiménez-Peris R, Patiño-Martínez M, Alonso G. Non-intrusive, parallel recovery of replicated data. In: Proceedings of the 21st Symposium on Reliable Distributed Systems; 2002. p. 150–9.Google Scholar
- 6.Kemme B. and Alonso G. Don’t be lazy, be consistent: Postgres-R, a new way to implement database replication. In: Proceedings of the 26th International Conference on Very Large Data Bases; 2000. p. 134–43.Google Scholar
- 8.Kemme B, Bartoli A, Babaoglu O. Online reconfiguration in replicated databases based on group communication. In: Proceedings of the International Conference on Dependable Systems and Networks; 2001. p. 117–30.Google Scholar
- 9.Lau E Madden S. An integrated approach to recovery and high availability in an updatable, distributed data warehouse. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006. p. 703–14.Google Scholar
- 10.Manassiev K, Amza C. Scaling and continuous availability in database server clusters through multiversion replication. In: Proceedings of the International Conference on Dependable Systems and Networks; 2007. p. 666–76.Google Scholar
- 11.Özsu MT, Valduriez P. Principles of distributed database systems. 2nd ed. Upper Saddle River: Prentice-Hall; 1999.Google Scholar
- 16.PostgreSQL PostgreSQL Point in Time Recovery. http://www.postgresql.org/docs/8.0/interactive/backup-online.html.
- 17.Vandiver B, Balakrishnan H, Liskov B, Madden S. Tolerating Byzantine faults in database systems using commit barrier scheduling. In: Proceedings of the 21st ACM Symposium on Operating System Principles; 2007. p. 59–72.Google Scholar