Advertisement

Datenbank-Spektrum

, Volume 19, Issue 3, pp 209–218 | Cite as

Lock-free Data Structures for Data Stream Processing

A Closer Look
  • Alexander BaumstarkEmail author
  • Constantin Pohl
Schwerpunktbeitrag
  • 16 Downloads

Abstract

Processing data in real-time instead of storing and reading from tables has led to a specialization of DBMS into the so-called data stream processing paradigm. While high throughput and low latency are key requirements to keep up with varying stream behavior and to allow fast reaction to incoming events, there are many possibilities how to achieve them. In combination with modern hardware, like server CPUs with tens of cores, the parallelization of stream queries for multithreading and vectorization is a common schema. High degrees of parallelism, however, need efficient synchronization mechanisms to allow good scaling with threads for shared memory access.In this work, we identify the most time-consuming operations for stream processing exemplarily for our own stream processing engine PipeFabric. In addition, we present different design principles of lock-free data structures which are suited to overcome those bottlenecks. We will finally demonstrate how lock-freedom greatly improves performance for join processing and tuple exchange between operators under different workloads. Nevertheless, the efficient usage of lock-free data structures comes with additional efforts and pitfalls, which we also discuss in this paper.

Keywords

Concurrent Data Structures Lock-free Stream Processing Parallelism 

References

  1. 1.
    Carbone P et al (2017) State management in Apache Flink®: consistent stateful distributed stream processing. Proceedings VLDB Endowment 10(12):1718–1729.  https://doi.org/10.14778/3137765.3137777 CrossRefGoogle Scholar
  2. 2.
    Cheng X et al (2017) A study of main-memory hash joins on many-core processor: a case with intel knights landing architecture. In: Proceedings of the 2017 ACM on conference on information and knowledge management CIKM 2017, Singapore, 06.–10.11.2017, pp 657–666  https://doi.org/10.1145/3132847.3132916 CrossRefGoogle Scholar
  3. 3.
    Dechev D et al (2010) Understanding and effectively preventing the ABA problem in descriptor-based lock-free designs. In: 13th IEEE international symposium on international symposium on object-oriented real-time distributed computing ISORC 2010, Carmona, Sevilla, 05.–06.05.2010, pp 185–192  https://doi.org/10.1109/ISORC.2010.10 CrossRefGoogle Scholar
  4. 4.
    Feldman SD et al (2013) Concurrent multi-level arrays: wait-free extensible hash maps. In: International conference on embedded computer systems: architectures, modeling and simulation IC-SAMOS 2013, Agios Konstantinos, 15.–18.07.2013. vol 2013, pp 155–163  https://doi.org/10.1109/SAMOS.2013.6621118 CrossRefGoogle Scholar
  5. 5.
    Gulisano V et al (2015) Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join. In: IEEE International Conference on Big Data IEEE Big Data 2015, Santa Clara, 29.10.–01.11.2015. vol 2015, pp 144–153Google Scholar
  6. 6.
    Härder T, Reuter A (1983) Principles of transaction-oriented database recovery. ACM Comput Surv 15(4):287–317.  https://doi.org/10.1145/289.291 MathSciNetCrossRefGoogle Scholar
  7. 7.
    Harris TL (2001) A pragmatic implementation of non-blocking linked-lists. In: Welch J (ed) Distributed computing DISC 2001, Lissabon, 03.–05.10.2001. Lecture Notes in Computer Science, vol 2180. Springer, Berlin, Heidelberg, pp 300–314  https://doi.org/10.1007/3-540-45414-4_21 CrossRefGoogle Scholar
  8. 8.
    Hennessy JL, Patterson DA (2006) Computer architecture: a quantitative approach, 4th edn.zbMATHGoogle Scholar
  9. 9.
    Herlihy M (1991) Wait-free synchronization. Acm Trans Program Lang Syst 13(1):124–149.  https://doi.org/10.1145/114005.102808 CrossRefGoogle Scholar
  10. 10.
    Herlihy M, Shavit N (2008) The art of multiprocessor programming. Morgan Kaufmann, AmsterdamGoogle Scholar
  11. 11.
    Khiszinsky M (2015) Lock-free data structures. The evolution of a stack. https://kukuruku.co/post/lock-free-data-structures-the-evolution-of-a-stack/. Accessed 14 June 2018Google Scholar
  12. 12.
    Makreshanski D et al (2018) Many-query join: efficient shared execution of relational joins on modern hardware. VLDB J 27(5):669–692.  https://doi.org/10.1007/s00778-017-0475-4 CrossRefGoogle Scholar
  13. 13.
    Mem SQ (2017) How does MemSQL’s in-memory lock-free storage engine work? https://docs.memsql.com/introduction/latest/memsql-faq/#how-does-memsql-s-in-memory-lock-free-storage-engine-work. Accessed 4 Aug 2018Google Scholar
  14. 14.
    Miao H et al (2017) Streambox: modern stream processing on a multicore machine. In: USENIX annual technical conference USENIX ATC 2017, Santa Clara, 12.–14.07.2017, pp 617–629Google Scholar
  15. 15.
    Michael MM (2002) High performance dynamic lock-free hash tables and list-based sets. In: Proceedings of the fourteenth annual ACM symposium on parallel algorithms and architectures SPAA ’02, Winnipeg, 11.–13.08.2002, pp 73–82  https://doi.org/10.1145/564870.564881 CrossRefGoogle Scholar
  16. 16.
    Michael MM (2004) Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects. IEEE Trans Parallel Distrib Syst 15(6):491–504.  https://doi.org/10.1109/TPDS.2004.8 CrossRefGoogle Scholar
  17. 17.
    Michael MM, Scott ML (1996) Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the fifteenth annual ACM symposium on principles of distributed computing PODC ’96, Philadelphia, 23.–26.05.1996, pp 267–275  https://doi.org/10.1145/248052.248106 CrossRefGoogle Scholar
  18. 18.
    Rethink D How are concurrent queries handled? https://www.rethinkdb.com/docs/architecture/. Accessed 4 Aug 2018Google Scholar
  19. 19.
    Treiber RK (1986) Systems programming: coping with parallelism. IBM, San Jose (Research report)Google Scholar
  20. 20.
    Valois JD (1995) Lock-free linked lists using compare-and-swap. In: Proceedings of the fourteenth annual ACM symposium on principles of distributed computing PODC 1995, Ottawa, 20.–23.08.1995, pp 214–222  https://doi.org/10.1145/224964.224988 CrossRefGoogle Scholar
  21. 21.
    Wen H et al (2018) Interval-based memory reclamation. In: Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming PPoPP 2018, Vienna, 24.–28.02.2018, pp 1–13  https://doi.org/10.1145/3178487.3178488 CrossRefGoogle Scholar
  22. 22.
    Wilschut AN, Apers PMG (1993) Dataflow query execution in a parallel main-memory environment. Distrib Parallel Databases 1(1):103–128.  https://doi.org/10.1007/BF01277522 CrossRefGoogle Scholar
  23. 23.
    Yu X et al (2014) Staring into the abyss: an evaluation of concurrency control with one thousand cores. Proceedings VLDB Endowment 8(3):209–220CrossRefGoogle Scholar
  24. 24.
    Zaharia M et al (2012) Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: 4th USENIX workshop on hot topics in cloud computing HotCloud ’12, Boston, 12.–13.06.2012Google Scholar
  25. 25.
    Zeuch S, Monte BD, Karimov J, Lutz C, Renz M, Traub J, Breß S, Rabl T, Markl V (2019) Analyzing efficient stream processing on modern hardware. Proceedings VLDB Endowment 12(5):516–530.  https://doi.org/10.14778/3303753.3303758 CrossRefGoogle Scholar

Copyright information

© Gesellschaft für Informatik e.V. and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Databases and Information Systems GroupTU IlmenauIlmenauGermany

Personalised recommendations