Skip to main content
Log in

Parallel replication across formats for scaling out mixed OLTP/OLAP workloads in main-memory databases

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Modern in-memory database systems are facing the need of efficiently supporting mixed workloads of OLTP and OLAP. A conventional approach to this requirement is to rely on ETL-style, application-driven data replication between two very different OLTP and OLAP systems, sacrificing real-time reporting on operational data. An alternative approach is to run OLTP and OLAP workloads in a single machine, which eventually limits the maximum scalability. In order to tackle this challenging problem, we propose a novel database replication architecture called HANA Asynchronous Parallel Table Replication (ATR). ATR supports OLTP workloads in one primary machine, while it supports heavy OLAP workloads in replicas. Here, row store formats can be used for OLTP transactions at the primary, while column store formats are used for OLAP analytical queries at the replicas. ATR is designed to support elastic scalability of OLAP query performance, while it minimizes the overhead for transaction processing at the primary and minimizes CPU consumption for replayed transactions at the replicas. ATR employs a novel optimistic lock-free parallel log replay scheme which exploits characteristics of multi-version concurrency control (MVCC) to enable real-time reporting by minimizing the propagation delay between the primary and replicas. It supports adaptive query routing depending on its predefined acceptable staleness range. Through extensive experiments with a concrete implementation available in a commercial product, we demonstrate that ATR achieves sub-second visibility delay even for update-intensive workloads, providing scalable OLAP performance without notable overhead to the primary. In addition, with extension of ATR to eager parallel replication, we demonstrate how the parallel log replay and its log-less replica recovery mechanisms improve run-time transaction performance under eager replication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Bailis, P., Venkataraman, S., Franklin, M.J., Hellerstein, J.M., Stoica, I.: Quantifying eventual consistency with PBS. VLDB J. 23(2), 279–302 (2014)

    Article  Google Scholar 

  2. Bornea, M.A., Hodson, O., Elnikety, S., Fekete, A.: One-copy serializability with snapshot isolation under the hood. In: Proceedings of the 27th IEEE ICDE Conference, pp. 625–636 (2011)

  3. Breitbart, Y., Komondoor, R., Rastogi, R., Seshadri, S., Silberschatz, A.: Update propagation protocols for replicated databases. In: Proceedings of the ACM SIGMOD Conference, pp. 97–108 (1999)

  4. Cecchet, E., Candea, G., Ailamaki, A.: Middleware-based database replication: the gaps between theory and practice. In: Proceedings of the ACM SIGMOD Conference, pp. 739–752 (2008)

  5. Chairunnanda, P., Daudjee, K., Özsu, M.T.: Confluxdb: multi-master replication for partitioned snapshot isolation databases. PVLDB 7(11), 947–958 (2014)

    Google Scholar 

  6. Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J.J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., et al.: Spanner: Googles globally distributed database. ACM Trans. Comput. Syst. 31(3), 8 (2013)

    Article  Google Scholar 

  7. Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. Proc. VLDB Endow. 3(1–2), 48–57 (2010)

    Article  Google Scholar 

  8. Das, S., Botev, C., Surlaker, K., Ghosh, B., Varadarajan, B., Nagaraj, S., Zhang, D., Gao, L., Westerman, J., Ganti, P., et al.: All aboard the databus!: Linkedin’s scalable consistent change data capture platform. In: Proceedings of the Third ACM Symposium on Cloud Computing, p. 18. ACM (2012)

  9. Daudjee, K., Salem, K.: Lazy database replication with snapshot isolation. In: Proceedings of the VLDB Conference, pp. 715–726 (2006)

  10. Elnikety, S., Dropsho, S.G., Pedone, F.: Tashkent: uniting durability with transaction ordering for high-performance scalable database replication. In: Proceedings of the EuroSys Conference, pp. 117–130 (2006)

  11. Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database—an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)

    Google Scholar 

  12. Galante, G., de Bona, L.C.E.: A survey on cloud computing elasticity. In: 2012 IEEE Fifth International Conference on Utility and Cloud Computing (UCC), pp. 263–270. IEEE (2012)

  13. Gray, J., Helland, P., O’Neil, P., Shasha, D.: The dangers of replication and a solution. ACM SIGMOD Rec. 25(2), 173–182 (1996)

    Article  Google Scholar 

  14. Heinze, T., Jerzak, Z., Hackenbroich, G., Fetzer, C.: Latency-aware elastic scaling for distributed data stream processing systems. In: Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems, pp. 13–22. ACM (2014)

  15. Herbst, N.R., Kounev, S., Reussner, R.H.: Elasticity in cloud computing: what it is, and what it is not. In: ICAC, pp. 23–27 (2013)

  16. Hong, C., Zhou, D., Yang, M., Kuo, C., Zhang, L., Zhou, L.: KuaFu: closing the parallelism gap in database replication. In: Proceedings of the 29th IEEE ICDE Conference, pp. 1186–1195 (2013)

  17. Kemme, B., Alonso, G.: Don’t be lazy, be consistent: Postgres-R, a new way to implement database replication. In: Proceedings of the 26th VLDB Conference, pp. 134–143 (2000)

  18. Kemper, A., Neumann, T.: Hyper: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: Proceedings of IEEE ICDE Conference, pp. 195–206 (2011)

  19. Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: A distributed messaging system for log processing. In: Proceedings of the NetDB, pp. 1–7 (2011)

  20. Krueger, J., Kim, C., Grund, M., Satish, N., Schwalb, D., Chhugani, J., Plattner, H., Dubey, P., Zeier, A.: Fast updates on read-optimized databases using multi-core CPUs. PVLDB 5(1), 61–72 (2011)

    Google Scholar 

  21. Lee, J., Kim, K., Cha, S.K.: Differential logging: a commutative and associative logging scheme for highly parallel main memory database. In: Proceedings of the 17th IEEE ICDE Conference, pp. 173–182 (2001)

  22. Lee, J., Kim, K.H., Na, H.J., Park, C.G., Lee, H.: Rowid-based data synchronization for asynchronous table replication. US Patent App. 14/657,938 (2015)

  23. Lee, J., Kwon, Y.S., Färber, F., Muehle, M., Lee, C., Bensberg, C., Lee, J.Y., Lee, A.H., Lehner, W.: SAP HANA distributed in-memory database system: transaction, session, and metadata management. In: Proceedings of the 29th IEEE ICDE Conference, pp. 1165–1173 (2013)

  24. Lee, J., Moon, S., Kim, K.H., Kim, D.H., Cha, S.K., Han, W.S.: Parallel replication across formats in SAP HANA for scaling out mixed OLTP/OLAP workloads. PVLDB 10(12), 1598–1609 (2017)

    Google Scholar 

  25. Lee, J., Park, C.G., Na, H.J., Kim, K.H.: Transactional and parallel log replay for asynchronous table replication. US Patent App. 14/657,948 (2015)

  26. Lee, J., Shin, H., Park, C.G., Ko, S., Noh, J., Chuh, Y., Stephan, W., Han, W.S.: Hybrid garbage collection for multi-version concurrency control in SAP HANA. In: Proceedings of the ACM SIGMOD Conference, pp. 1307–1318 (2016)

  27. Li, B., Ruan, Z., Xiao, W., Lu, Y., Xiong, Y., Putnam, A., Chen, E., Zhang, L.: KV-direct: high-performance in-memory key-value store with programmable NIC. In: Proceedings of the 26th Symposium on Operating Systems Principles, pp. 137–152. ACM (2017)

  28. Makreshanski, D., Giceva, J., Barthels, C., Alonso, G.: BatchDB: efficient isolated execution of hybrid OLTP + OLAP workloads for interactive applications. In: Proceedings of the ACM SIGMOD Conference, pp. 37–50 (2017)

  29. May, N., Böhm, A., Block, M., Lehner, W.: Managed query processing within the SAP HANA database platform. Datenbank-Spektrum 15(2), 141–152 (2015)

    Article  Google Scholar 

  30. May, N., Bohm, A., Lehner, W.: SAP HANA—the evolution of an in-memory DBMS from pure OLAP processing towards mixed workloads. Datenbanksysteme für Business, Technologie und Web (BTW 2017) (2017)

  31. Mühlbauer, T., Rödiger, W., Reiser, A., Kemper, A., Neumann, T., et al.: Scyper: a hybrid OLTP & OLAP distributed main memory database system for scalable real-time analytics. In: BTW, pp. 499–502 (2013)

  32. Pacitti, E., Simon, E.: Update propagation strategies to improve freshness in lazy master replicated databases. VLDB J. 8(3–4), 305–318 (2000)

    Article  Google Scholar 

  33. Patterson, S., Elmore, A.J., Nawab, F., Agrawal, D., El Abbadi, A.: Serializability, not serial: concurrency control and availability in multi-datacenter datastores. PVLDB 5(11), 1459–1470 (2012)

    Google Scholar 

  34. Perez-Sorrosal, F., Patiño-Martinez, M., Jimenez-Peris, R., Kemme, B.: Elastic SI-Cache: consistent and scalable caching in multi-tier architectures. VLDB J. 20(6), 841–865 (2011)

    Article  Google Scholar 

  35. Plattner, C., Alonso, G.: Ganymed: Scalable replication for transactional web applications. In: Proceedings of the ACM USENIX Middleware Conference, pp. 155–174 (2004)

  36. Plattner, H.: A common database approach for OLTP and OLAP using an in-memory column database. In: Proceedings of the ACM SIGMOD Conference, pp. 1–2. ACM (2009)

  37. Psaroudakis, I., Wolf, F., May, N., Neumann, T., Böhm, A., Ailamaki, A., Sattler, K.U.: Scaling up mixed workloads: a battle of data freshness, flexibility, and scheduling. In: Technology Conference on Performance Evaluation and Benchmarking, pp. 97–112. Springer (2014)

  38. Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P., Gray, J., et al.: A reconfigurable fabric for accelerating large-scale datacenter services. In: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 13–24. IEEE (2014)

  39. SAP: high availability for SAP HANA. https://archive.sap.com/documents/docs/DOC-65585

  40. SAP: SAP HANA capture and replay tool. https://blogs.sap.com/2016/06/14/introducing-the-new-sap-hana-capture-and-replay-tool-available-with-sap-hana-sps12/

  41. SAP: SAP LT (SLT) replication server. http://www.sap.com/community/topic/lt-replication-server.html

  42. Simitsis, A., Vassiliadis, P., Sellis, T.: Optimizing ETL processes in data warehouses. In: Proceedings of the 21st IEEE ICDE Conference, pp. 564–575 (2005)

  43. Sousa, F.R., Machado, J.C.: Towards elastic multi-tenant database replication with quality of service. In: Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing, pp. 168–175. IEEE Computer Society (2012)

  44. Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)

    Article  Google Scholar 

  45. Weikum, G., Vossen, G.: Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Elsevier, Amsterdam (2001)

    Google Scholar 

  46. Willhalm, T., Popovici, N., Boshmaf, Y., Plattner, H., Zeier, A., Schaffner, J.: SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. Proc. VLDB Endow. 2(1), 385–394 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge Hyejeong Lee, Deok Koo Kim, Kyungyul Park, Christian Bensberg, Martin Heidel, Joern Schmidt, Michael Muehle, Mihnea Andrei, Alexander Boehm and many other colleagues in HANA development team who supported and helped ATR development. Also, the authors would like to deeply thank anonymous VLDB Journal reviewers who provided invaluable comments and suggested ideas to improve the contents.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juchang Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, J., Han, WS., Na, H.J. et al. Parallel replication across formats for scaling out mixed OLTP/OLAP workloads in main-memory databases. The VLDB Journal 27, 421–444 (2018). https://doi.org/10.1007/s00778-018-0503-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-018-0503-z

Keywords

Navigation