Abstract
Media failures usually leave database systems unavailable for several hours until recovery is complete, especially in applications with large devices and high transaction volume. Previous work introduced a technique called single-pass restore, which increases restore bandwidth and thus substantially decreases time to repair. Instant restore goes further as it permits read/write access to any data on a device undergoing restore—even data not yet restored—by restoring individual data segments on demand. Thus, the restore process is guided primarily by the needs of applications, and the observed mean time to repair is effectively reduced from several hours to a few seconds.
This paper presents an implementation and evaluation of instant restore. The technique is incrementally implemented on a system starting with the traditional ARIES design for logging and recovery. Experiments show that the transaction latency perceived after a media failure can be cut down to less than a second. The net effect is that a few “nines” of availability are added to the system using simple and low-overhead software techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Arulraj, J., Pavlo, A., Dulloor, S.: Let’s talk about storage & recovery methods for non-volatile memory database systems. In: Proceedings of SIGMOD, pp. 707–722 (2015)
Bitton, D., Gray, J.: Disk shadowing. In: Proceedings of VLDB, pp. 331–338 (1988)
Chen, P.M., et al.: RAID: high-performance, reliable secondary storage. ACM Comput. Surv. 26(2), 145–185 (1994)
Eich, M.H.: A classification and comparison of main memory database recovery techniques. In: Proceedings of ICDE, pp. 332–339 (1987)
GLIBC: The GNU C Library Reference Manual (2014), http://www.gnu.org/software/libc/manual/html_node/Renaming-Files.html. Accessed 06 Oct 2014
Graefe, G., Guy, W., Sauer, C.: Instant Recovery with Write-Ahead Logging: Page Repair, System Restart, Media Restore, and System Failover, 2nd edn. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2016)
Graefe, G., Kimura, H., Kuno, H.A.: Foster B-trees. ACM Trans. Database Syst. 37(3), 17 (2012)
Graefe, G., Kuno, H.A.: Definition, detection, and recovery of single-page failures, a fourth class of database failures. PVLDB 5(7), 646–655 (2012)
Graefe, G., Kuno, H.A., Seeger, B.: Self-diagnosing and self-healing indexes. In: Proceedings of DBTest, p. 8 (2012)
Gray, J.N.: Notes on data base operating systems. In: Bayer, R., Graham, R.M., Seegmüller, G. (eds.) Operating Systems. LNCS, vol. 60, pp. 393–481. Springer, Heidelberg (1978). doi:10.1007/3-540-08755-9_9
Gray, J.: Why do computers stop and what can be done about it? In: Symposium on Reliability in Distributed Software and Database Systems, pp. 3–12 (1986)
Gray, J.: What next?: a dozen information-technology research goals. J. ACM 50(1), 41–57 (2003)
Haderle, D.J., Majithia, T.: Fast log apply, US Patent 6,289,355, 11 September 2001
Härder, T., Reuter, A.: Principles of transaction-oriented database recovery. ACM Comput. Surv. 15(4), 287–317 (1983)
Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: Proceedings of EDBT, pp. 24–35 (2009)
Lehman, T.J., Carey, M.J.: A recovery algorithm for a high-performance memory-resident database system. In: Proceedings of SIGMOD, pp. 104–117 (1987)
Levy, E., Silberschatz, A.: Incremental recovery in main memory database systems. IEEE Trans. Knowl. Data Eng. 4(6), 529–540 (1992)
Malviya, N., Weisberg, A., Madden, S., Stonebraker, M.: Rethinking main memory OLTP recovery. In: Proceedings of ICDE, pp. 604–615 (2014)
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. 17(1), 94–162 (1992)
Mohan, C., Narang, I.: An efficient and flexible method for archiving a data base. SIGMOD Rec. 22(2), 139–146 (1993)
Mohan, C., Treiber, K., Obermarck, R.: Algorithms for the management of remote backup data bases for disaster recovery. In: Proceedings of ICDE, pp. 511–518 (1993)
Oracle Corporation: RMAN Incremental Backups, Oracle Database Documentation 10g, Sect. 4.4 (2015)
Oukid, I., et al.: SOFORT: a hybrid SCM-DRAM storage engine for fast data recovery. In: Proceedings of DaMoN, pp. 8:1–8:7 (2014)
Sauer, C., Graefe, G., Härder, T.: Single-pass restore after a media failure. In: Proceedings of BTW. LNI, vol. 241, pp. 217–236 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sauer, C., Graefe, G., Härder, T. (2017). Instant Restore After a Media Failure. In: Kirikova, M., Nørvåg, K., Papadopoulos, G. (eds) Advances in Databases and Information Systems. ADBIS 2017. Lecture Notes in Computer Science(), vol 10509. Springer, Cham. https://doi.org/10.1007/978-3-319-66917-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-66917-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66916-8
Online ISBN: 978-3-319-66917-5
eBook Packages: Computer ScienceComputer Science (R0)