Backtrack-Based and Window-Oriented Optimistic Failure Recovery in Distributed Stream Processing

Chen, Qiming; Hsu, Meichun; Castellanos, Malu

doi:10.1007/978-3-662-46839-5_4

Qiming Chen¹⁰,
Meichun Hsu¹⁰ &
Malu Castellanos¹⁰

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 206))

Included in the following conference series:

655 Accesses

Abstract

Support transaction property and fault-tolerance is the key to applying stream processing to industry-scale applications; however the corresponding latency overhead must be minimized for accommodating real-time analytics. This issue has been studied in various contexts. In this work we develop the backtrack failure recovery mechanism to allow a task to roll forward without waiting for acknowledgement from its downstream target tasks in the failure-free case, but to request its upstream source tasks to resend the missing tuples only during failure recovery which is the rare case thus has limited impact on the overall performance. For further reduced latency we extend our solution in another dimension by applying the notion of optimistic checkpointing to stream processing, and propose the Continued stream processing with Window - based Checkpoint and Recovery (CWCR) approach, allowing a task to emit results tuple by tuple continuously but checkpoint in batch, and acknowledge, only once per window (e.g. time window). We also tackle the hard problems found in implementing a transactional layer on-top of an existing stream processing platform. We have implemented the proposed mechanisms on Fontainebleau, the distributed stream analytics infrastructure we built on top of the open-sourced Storm platform. Our experiment results reveal the novelty of the proposed technologies and the feasibility to support fault-tolerance with minimal latency overhead for real-time stream processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J 15(2), 121–142 (2006)
Google Scholar
Abadi, D.J., et al.: The design of the Borealis stream processing engine. In: CIDR (2005)
Google Scholar
Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-Tolerance in the Borealis distributed stream processing system. In: SIGMOD 2005 (2005)
Google Scholar
Botan, I., Fischer, P.M., Kossmann, D., Tatbu, N.: Transactional stream processing. In: EDBT 2012 (2012)
Google Scholar
Johnson, D.B., Zwaenepoel, W.: Recovery in distributed systems using optimistic message logging and checkpointing. J. Algorithms 11, 462–491 (1990)
Google Scholar
Chen, Q., Hsu, M., Zeller, H.: Experience in continuous analytics as a service. In: EDBT 2011 (2011)
Google Scholar
Chen, Q., Hsu, M.: Experience in extending query engine for continuous analytics. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DAWAK 2010. LNCS, vol. 6263, pp. 190–202. Springer, Heidelberg (2010)
Chapter Google Scholar
Chen, Q., Hsu, M.: Query engine net for streaming analytics. In: Proceedings of 19th International Conference on Cooperative Information Systems (CoopIS) (2011)
Google Scholar
DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J., Royalty, J., Shankar, S., Krioukov, A.: Clustera: an integrated computation and data management system. In: VLDB 2008 (2008)
Google Scholar
Franklin, M.J., et al.: Continuous analytics: rethinking query processing in a network-effect world. In: CIDR 2009 (2009)
Google Scholar
Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.C.: SPADE: the system S declarative stream processing engine. In: ACM SIGMOD 2008 (2008)
Google Scholar
Hwang, J.-H., Balazinska, M., et al.: High-availability algorithms for distributed stream processing. In: Proceedings of ICDE 2005, Washington, DC, USA (2005)
Google Scholar
Johnson, D.B., Zwaenepoel, W.: Recovery in distributed systems using optimistic message logging and checkpointing. J. Algorithms 11, 462–491 (1990)
Article MATH MathSciNet Google Scholar
Li, J., Karp, A.: Access control for the services oriented architecture. In: ACM Workshop on Secure Web Services (2007)
Google Scholar
Shah, M.A., Hellerstein, J.M., Brewer, E.: Highly available, fault-tolerant, parallel datafows. In: Proceedings of SIGMOD, New York, USA (2004)
Google Scholar
Prasad Sistla, A., Welch, J.L.: Efficient distributed recovery using message logging. In: Proceedings of the Eighth Annual ACM Symposium on Principles of Distributed Computing (1989)
Google Scholar
Stiegler, M., Li, J., Kambatla, K., Karp, A.: Clusterken: A Reliable Object-Based Messaging Framework to Support Data Center Processing, HPL-2011–44 (2011)
Google Scholar
Tweeter, Transactional topologies (2012). https://github.com/nathanmarz/storm/wiki/Transactional-topologies
Wang, Y.M., Fuchs, W.K.: Optimistic message logging for independent checkpointing in message-passing systems. In: IEEE Symposium on Reliable Distribution System, pp. 147–154 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

HP Labs, Palo Alto, CA, USA
Qiming Chen, Meichun Hsu & Malu Castellanos

Authors

Qiming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Meichun Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Malu Castellanos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiming Chen .

Editor information

Editors and Affiliations

Hewlett-Packard, Palo Alto, USA
Malu Castellanos
Hitachi Laboratories, Santa Clara, California, USA
Umeshwar Dayal
Aalborg University, Aalborg, Denmark
Torben Bach Pedersen
Intel Labs and MIT, Cambridge, Massachusetts, USA
Nesime Tatbul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Q., Hsu, M., Castellanos, M. (2015). Backtrack-Based and Window-Oriented Optimistic Failure Recovery in Distributed Stream Processing. In: Castellanos, M., Dayal, U., Pedersen, T., Tatbul, N. (eds) Enabling Real-Time Business Intelligence. BIRTE BIRTE 2014 2013. Lecture Notes in Business Information Processing, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46839-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-662-46839-5_4
Published: 30 April 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46838-8
Online ISBN: 978-3-662-46839-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics