Parallel replication across formats for scaling out mixed OLTP/OLAP workloads in main-memory databases

  • Juchang Lee
  • Wook-Shin Han
  • Hyoung Jun Na
  • Chang Gyoo Park
  • Kyu Hwan Kim
  • Deok Hoe Kim
  • Joo Yeon Lee
  • Sang Kyun Cha
  • SeungHyun Moon
Regular Paper
  • 31 Downloads

Abstract

Modern in-memory database systems are facing the need of efficiently supporting mixed workloads of OLTP and OLAP. A conventional approach to this requirement is to rely on ETL-style, application-driven data replication between two very different OLTP and OLAP systems, sacrificing real-time reporting on operational data. An alternative approach is to run OLTP and OLAP workloads in a single machine, which eventually limits the maximum scalability. In order to tackle this challenging problem, we propose a novel database replication architecture called HANA Asynchronous Parallel Table Replication (ATR). ATR supports OLTP workloads in one primary machine, while it supports heavy OLAP workloads in replicas. Here, row store formats can be used for OLTP transactions at the primary, while column store formats are used for OLAP analytical queries at the replicas. ATR is designed to support elastic scalability of OLAP query performance, while it minimizes the overhead for transaction processing at the primary and minimizes CPU consumption for replayed transactions at the replicas. ATR employs a novel optimistic lock-free parallel log replay scheme which exploits characteristics of multi-version concurrency control (MVCC) to enable real-time reporting by minimizing the propagation delay between the primary and replicas. It supports adaptive query routing depending on its predefined acceptable staleness range. Through extensive experiments with a concrete implementation available in a commercial product, we demonstrate that ATR achieves sub-second visibility delay even for update-intensive workloads, providing scalable OLAP performance without notable overhead to the primary. In addition, with extension of ATR to eager parallel replication, we demonstrate how the parallel log replay and its log-less replica recovery mechanisms improve run-time transaction performance under eager replication.

Keywords

Database replication In-memory database Scaling out SAP HANA 

Notes

Acknowledgements

The authors would like to acknowledge Hyejeong Lee, Deok Koo Kim, Kyungyul Park, Christian Bensberg, Martin Heidel, Joern Schmidt, Michael Muehle, Mihnea Andrei, Alexander Boehm and many other colleagues in HANA development team who supported and helped ATR development. Also, the authors would like to deeply thank anonymous VLDB Journal reviewers who provided invaluable comments and suggested ideas to improve the contents.

References

  1. 1.
    Bailis, P., Venkataraman, S., Franklin, M.J., Hellerstein, J.M., Stoica, I.: Quantifying eventual consistency with PBS. VLDB J. 23(2), 279–302 (2014)CrossRefGoogle Scholar
  2. 2.
    Bornea, M.A., Hodson, O., Elnikety, S., Fekete, A.: One-copy serializability with snapshot isolation under the hood. In: Proceedings of the 27th IEEE ICDE Conference, pp. 625–636 (2011)Google Scholar
  3. 3.
    Breitbart, Y., Komondoor, R., Rastogi, R., Seshadri, S., Silberschatz, A.: Update propagation protocols for replicated databases. In: Proceedings of the ACM SIGMOD Conference, pp. 97–108 (1999)Google Scholar
  4. 4.
    Cecchet, E., Candea, G., Ailamaki, A.: Middleware-based database replication: the gaps between theory and practice. In: Proceedings of the ACM SIGMOD Conference, pp. 739–752 (2008)Google Scholar
  5. 5.
    Chairunnanda, P., Daudjee, K., Özsu, M.T.: Confluxdb: multi-master replication for partitioned snapshot isolation databases. PVLDB 7(11), 947–958 (2014)Google Scholar
  6. 6.
    Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J.J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., et al.: Spanner: Googles globally distributed database. ACM Trans. Comput. Syst. 31(3), 8 (2013)CrossRefGoogle Scholar
  7. 7.
    Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. Proc. VLDB Endow. 3(1–2), 48–57 (2010)CrossRefGoogle Scholar
  8. 8.
    Das, S., Botev, C., Surlaker, K., Ghosh, B., Varadarajan, B., Nagaraj, S., Zhang, D., Gao, L., Westerman, J., Ganti, P., et al.: All aboard the databus!: Linkedin’s scalable consistent change data capture platform. In: Proceedings of the Third ACM Symposium on Cloud Computing, p. 18. ACM (2012)Google Scholar
  9. 9.
    Daudjee, K., Salem, K.: Lazy database replication with snapshot isolation. In: Proceedings of the VLDB Conference, pp. 715–726 (2006)Google Scholar
  10. 10.
    Elnikety, S., Dropsho, S.G., Pedone, F.: Tashkent: uniting durability with transaction ordering for high-performance scalable database replication. In: Proceedings of the EuroSys Conference, pp. 117–130 (2006)Google Scholar
  11. 11.
    Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database—an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)Google Scholar
  12. 12.
    Galante, G., de Bona, L.C.E.: A survey on cloud computing elasticity. In: 2012 IEEE Fifth International Conference on Utility and Cloud Computing (UCC), pp. 263–270. IEEE (2012)Google Scholar
  13. 13.
    Gray, J., Helland, P., O’Neil, P., Shasha, D.: The dangers of replication and a solution. ACM SIGMOD Rec. 25(2), 173–182 (1996)CrossRefGoogle Scholar
  14. 14.
    Heinze, T., Jerzak, Z., Hackenbroich, G., Fetzer, C.: Latency-aware elastic scaling for distributed data stream processing systems. In: Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems, pp. 13–22. ACM (2014)Google Scholar
  15. 15.
    Herbst, N.R., Kounev, S., Reussner, R.H.: Elasticity in cloud computing: what it is, and what it is not. In: ICAC, pp. 23–27 (2013)Google Scholar
  16. 16.
    Hong, C., Zhou, D., Yang, M., Kuo, C., Zhang, L., Zhou, L.: KuaFu: closing the parallelism gap in database replication. In: Proceedings of the 29th IEEE ICDE Conference, pp. 1186–1195 (2013)Google Scholar
  17. 17.
    Kemme, B., Alonso, G.: Don’t be lazy, be consistent: Postgres-R, a new way to implement database replication. In: Proceedings of the 26th VLDB Conference, pp. 134–143 (2000)Google Scholar
  18. 18.
    Kemper, A., Neumann, T.: Hyper: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: Proceedings of IEEE ICDE Conference, pp. 195–206 (2011)Google Scholar
  19. 19.
    Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: A distributed messaging system for log processing. In: Proceedings of the NetDB, pp. 1–7 (2011)Google Scholar
  20. 20.
    Krueger, J., Kim, C., Grund, M., Satish, N., Schwalb, D., Chhugani, J., Plattner, H., Dubey, P., Zeier, A.: Fast updates on read-optimized databases using multi-core CPUs. PVLDB 5(1), 61–72 (2011)Google Scholar
  21. 21.
    Lee, J., Kim, K., Cha, S.K.: Differential logging: a commutative and associative logging scheme for highly parallel main memory database. In: Proceedings of the 17th IEEE ICDE Conference, pp. 173–182 (2001)Google Scholar
  22. 22.
    Lee, J., Kim, K.H., Na, H.J., Park, C.G., Lee, H.: Rowid-based data synchronization for asynchronous table replication. US Patent App. 14/657,938 (2015)Google Scholar
  23. 23.
    Lee, J., Kwon, Y.S., Färber, F., Muehle, M., Lee, C., Bensberg, C., Lee, J.Y., Lee, A.H., Lehner, W.: SAP HANA distributed in-memory database system: transaction, session, and metadata management. In: Proceedings of the 29th IEEE ICDE Conference, pp. 1165–1173 (2013)Google Scholar
  24. 24.
    Lee, J., Moon, S., Kim, K.H., Kim, D.H., Cha, S.K., Han, W.S.: Parallel replication across formats in SAP HANA for scaling out mixed OLTP/OLAP workloads. PVLDB 10(12), 1598–1609 (2017)Google Scholar
  25. 25.
    Lee, J., Park, C.G., Na, H.J., Kim, K.H.: Transactional and parallel log replay for asynchronous table replication. US Patent App. 14/657,948 (2015)Google Scholar
  26. 26.
    Lee, J., Shin, H., Park, C.G., Ko, S., Noh, J., Chuh, Y., Stephan, W., Han, W.S.: Hybrid garbage collection for multi-version concurrency control in SAP HANA. In: Proceedings of the ACM SIGMOD Conference, pp. 1307–1318 (2016)Google Scholar
  27. 27.
    Li, B., Ruan, Z., Xiao, W., Lu, Y., Xiong, Y., Putnam, A., Chen, E., Zhang, L.: KV-direct: high-performance in-memory key-value store with programmable NIC. In: Proceedings of the 26th Symposium on Operating Systems Principles, pp. 137–152. ACM (2017)Google Scholar
  28. 28.
    Makreshanski, D., Giceva, J., Barthels, C., Alonso, G.: BatchDB: efficient isolated execution of hybrid OLTP + OLAP workloads for interactive applications. In: Proceedings of the ACM SIGMOD Conference, pp. 37–50 (2017)Google Scholar
  29. 29.
    May, N., Böhm, A., Block, M., Lehner, W.: Managed query processing within the SAP HANA database platform. Datenbank-Spektrum 15(2), 141–152 (2015)CrossRefGoogle Scholar
  30. 30.
    May, N., Bohm, A., Lehner, W.: SAP HANA—the evolution of an in-memory DBMS from pure OLAP processing towards mixed workloads. Datenbanksysteme für Business, Technologie und Web (BTW 2017) (2017)Google Scholar
  31. 31.
    Mühlbauer, T., Rödiger, W., Reiser, A., Kemper, A., Neumann, T., et al.: Scyper: a hybrid OLTP & OLAP distributed main memory database system for scalable real-time analytics. In: BTW, pp. 499–502 (2013)Google Scholar
  32. 32.
    Pacitti, E., Simon, E.: Update propagation strategies to improve freshness in lazy master replicated databases. VLDB J. 8(3–4), 305–318 (2000)CrossRefGoogle Scholar
  33. 33.
    Patterson, S., Elmore, A.J., Nawab, F., Agrawal, D., El Abbadi, A.: Serializability, not serial: concurrency control and availability in multi-datacenter datastores. PVLDB 5(11), 1459–1470 (2012)Google Scholar
  34. 34.
    Perez-Sorrosal, F., Patiño-Martinez, M., Jimenez-Peris, R., Kemme, B.: Elastic SI-Cache: consistent and scalable caching in multi-tier architectures. VLDB J. 20(6), 841–865 (2011)CrossRefGoogle Scholar
  35. 35.
    Plattner, C., Alonso, G.: Ganymed: Scalable replication for transactional web applications. In: Proceedings of the ACM USENIX Middleware Conference, pp. 155–174 (2004)Google Scholar
  36. 36.
    Plattner, H.: A common database approach for OLTP and OLAP using an in-memory column database. In: Proceedings of the ACM SIGMOD Conference, pp. 1–2. ACM (2009)Google Scholar
  37. 37.
    Psaroudakis, I., Wolf, F., May, N., Neumann, T., Böhm, A., Ailamaki, A., Sattler, K.U.: Scaling up mixed workloads: a battle of data freshness, flexibility, and scheduling. In: Technology Conference on Performance Evaluation and Benchmarking, pp. 97–112. Springer (2014)Google Scholar
  38. 38.
    Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P., Gray, J., et al.: A reconfigurable fabric for accelerating large-scale datacenter services. In: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 13–24. IEEE (2014)Google Scholar
  39. 39.
    SAP: high availability for SAP HANA. https://archive.sap.com/documents/docs/DOC-65585
  40. 40.
  41. 41.
  42. 42.
    Simitsis, A., Vassiliadis, P., Sellis, T.: Optimizing ETL processes in data warehouses. In: Proceedings of the 21st IEEE ICDE Conference, pp. 564–575 (2005)Google Scholar
  43. 43.
    Sousa, F.R., Machado, J.C.: Towards elastic multi-tenant database replication with quality of service. In: Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing, pp. 168–175. IEEE Computer Society (2012)Google Scholar
  44. 44.
    Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)CrossRefGoogle Scholar
  45. 45.
    Weikum, G., Vossen, G.: Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Elsevier, Amsterdam (2001)Google Scholar
  46. 46.
    Willhalm, T., Popovici, N., Boshmaf, Y., Plattner, H., Zeier, A., Schaffner, J.: SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. Proc. VLDB Endow. 2(1), 385–394 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.SAP Labs KoreaSeoulKorea
  2. 2.Seoul National UniversitySeoulKorea
  3. 3.Pohang University of Science and TechnologyPohangKorea

Personalised recommendations