Skip to main content

Exactly-Once Semantics with Real-Time Data Pipelines

  • Conference paper
  • First Online:
Book cover Ambient Communications and Computer Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 696))

  • 1251 Accesses

Abstract

Real-time systems like IoT, recommendation systems, fraud detection systems often have a need of ensuring that the application processes the data only once. In real-time streaming applications there is often a possibility that a batch of data might be handed over to the application multiple times resulting in duplicate data being processed by the application. Any stream processing product cannot unilaterally guarantee exactly once processing semantics. It is true under certain assumptions or when the application and the stream processing framework collaborate in certain ways. In this paper, we present a design to address the problem of real-time streaming applications by achieving an end-to-end exactly once delivery. The main contribution of our work is to provide solution to the complex task of recovering the application state from application restarts, network crashes, etc., and detecting and filtering out of order duplicate data while maintaining a high throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache Kafka https://kafka.apache.org/

  2. Apache Flume https://flume.apache.org/

  3. Apache Spark Streaming Guide, http://spark.apache.org/docs/latest

  4. Apache Flink https://flink.apache.org/

  5. Vertica, https://www.vertica.com/

  6. Pat Helland. Idempotence is not a medical condition. Queue, 10(4), 2012

    Google Scholar 

  7. Dough Lea, Brian Goetz, Joshua Bloch, Tim Peierls, David Holmes. Java Concurrency In Practice, Pearson, 2016

    Google Scholar 

  8. Krishnamachari, S. and Patel, K.H. and Kimmel, J.S. and McClanahan, E.D. Exactly once semantics, Google Patents, US20160246522 A1, Aug 25, 2016

    Google Scholar 

  9. Jeremy Brown, J. P. Grossman, and Tom Knight. A lightweight idempotent messaging protocol for faulty networks, In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, SPAA’02, pages 248–257, New York, NY, USA, 2002, ACM

    Google Scholar 

  10. R. Barga, D. Lomet, S. Paparizos, Haifeng Yu, and S. Chandrasekaran. Persistent applications via automatic recovery, In Database Engineering and Applications Symposium, 2003, Proceedings. Seventh International, pages 258–267, July 2003

    Google Scholar 

  11. Kaushik Dutta, Debra E. VanderMeer, Anindya Datta, and Krithi Ramamritham. User action recovery in internet sagas (isagas), In Proceedings of the Second International Workshop on Technologies for E-Services, TES’01, pages 132–146, London, UK, UK, 2001, Springer-Verlag

    Google Scholar 

  12. George, Lars. H Base The Definitive Guide, 2nd Edition, 2015, O’Reilly Media

    Google Scholar 

  13. Hewitt, Eben. Cassandra The Definitive Guide, 2010, O’Reilly Media

    Google Scholar 

  14. Kreps, Jay. Exactly-once Support in Apache Kafka, https://medium.com/@jaykreps/exactly-once-support-in-apache-kafka-55e1fdd0a35f

  15. KIP – 129 Streams Exactly Once Semantics, https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics

  16. Neha Narkhade, Exactly-once Semantics are Possible: Here’s How Kafka Does it, https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it

  17. Matthias J. Sax. Introducing Exactly Once Semantics In Apache Kafka, Spark Summit 2017, June 5–7 2017 San Francisco https://spark-summit.org/2017/events/introducing-exactly-once-semantics-in-apache-kafka

  18. Tathagata Das. Improved Fault-tolerance and Zero Data Loss in Apache Spark Streaming, Jan 15, 2015, https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html

  19. Gosh, Pranab. Exactly Once Stream Processing Semantics? Not Exactly. https://pkghosh.wordpress.com/2016/05/18/exactly-once-stream-processing-semantics-not-exactly

  20. Naghmeh Ivaki, Filipe Araujo, Raul Barbosa, A Middleware for Exactly-Once Semantics in Request-Response Interactions, Dependable Computing (PRDC), 2012 IEEE 18th Pacific Rim International Symposium on Dependable Computing, 18–19 Nov 2012 pp 31-40

    Google Scholar 

  21. Yongqiang Huang and H. Garcia-Molina. Exactly-once semantics in a replicated messaging system, Proceedings 17th International Conference on Da ta Engineering, Heidelberg, 2001, pp. 3–12

    Google Scholar 

  22. Exactly-once Spark Streaming from Apache Kafka http://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/

  23. Holden Karau, Andy Konwinski, Patric Wendell, Matei Zaharia. Learning Spark, OReilly Media Inc, 2015

    Google Scholar 

  24. Holden Karau, Rachel Warran, High Performance Spark, O’Reilly Media Inc, 2017

    Google Scholar 

  25. Sandy Ryza, Advanced Analytics with Spark, O’Reilly Media Inc, 2015

    Google Scholar 

  26. Apache Zookeeper. https://zookeeper.apache.org/

  27. Spark Streaming + Kafka Integration Guide. https://spark.apache.org/docs/latest/streaming-kafka-0-8-integration.html

  28. Redis https://redis.io/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avnish Kumar Rastogi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rastogi, A.K., Malik, N., Hooda, S. (2018). Exactly-Once Semantics with Real-Time Data Pipelines. In: Perez, G., Tiwari, S., Trivedi, M., Mishra, K. (eds) Ambient Communications and Computer Systems. Advances in Intelligent Systems and Computing, vol 696. Springer, Singapore. https://doi.org/10.1007/978-981-10-7386-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7386-1_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7385-4

  • Online ISBN: 978-981-10-7386-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics