Open Streaming Operation Patterns

Chen, Qiming; Hsu, Meichun

doi:10.1007/978-3-642-45315-1_4

Qiming Chen¹⁸ &
Meichun Hsu¹⁸

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8320))

353 Accesses

Abstract

We describe our canonical dataflow operator framework for distributed stream analytics. This framework is characterized by the notion of open-executors. A dataflow process is composed by chained operators which form a graph-structured topology, with each logical operator executed by multiple physical instances running in parallel over distributed server nodes. An open executor supports the streaming operations with specific characteristics and running pattern, but is open for the application logic to be plugged-in. This framework allows us to provide automated and systematic support for executing, parallelizing and granulizing the continuous operations.

We illustrate the power of this approach by solving the following problems: first, how to categorize the meta-properties of stream operators such as the I/O, blocking, data grouping characteristics, for providing unified and automated system support; next, how to elastically and correctly parallelize a stateful operator that is history-sensitive, relying on the prior state and data processing results; how to analyze unbounded stream granularly to ensure sound semantics (e.g. aggregation); and further, how to deal with parallel sliding window based stream processing systematically. These capabilities are not systematically supported in the current generation of stream processing systems, but left to user programs which can result in fragile code, disappointing performance and incorrect results. Instead, solving these problems using open-executors benefits many applications with system guaranteed semantics and reliability.

In general, with the proposed canonical dataflow operator framework we can standardize the operator execution patterns, and to support these patterns systematically and automatically. The value of our approach in real-time, continuous, elastic data-parallel and topological stream analytics has been revealed by the experiment results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arasu, A., Babu, S., Widom, J., The, C.Q.L.: Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal 15(2) (June 2006)
Google Scholar
Abadi, D.J., et al.: The Design of the Borealis Stream Processing Engine. In: CIDR (2005)
Google Scholar
Bryant, R.E.: Data-Intensive Supercomputing: The case for DISC. CMU-CS-07-128 (2007)
Google Scholar
Chen, Q., Hsu, M., Zeller, H.: Experience in Continuous analytics as a Service (CaaaS). In: EDBT 2011 (2011)
Google Scholar
Chen, Q., Hsu, M.: Experience in Extending Query Engine for Continuous Analytics. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DAWAK 2010. LNCS, vol. 6263, pp. 190–202. Springer, Heidelberg (2010)
Chapter Google Scholar
Chen, Q., Hsu, M.: Continuous mapReduce for in-DB stream analytics. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6428, pp. 16–34. Springer, Heidelberg (2010)
Chapter Google Scholar
Dean, J.: Experiences with MapReduce, an abstraction for large-scale computation. In: Int. Conf. on Parallel Architecture and Compilation Techniques. ACM (2006)
Google Scholar
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: Distributed data-parallel programs from sequential building blocks. In: EuroSys 2007 (March 2007)
Google Scholar
Franklin, M.J., et al.: Continuous Analytics: Rethinking Query Processing in a Network-Effect World. In: CIDR 2009 (2009)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A Not-So-Foreign Language for Data Processing. In: ACM SIGMOD 2008 (2008)
Google Scholar
ØMQ Lightweight Messaging Kernel, http://www.zeromq.org/
Apache ZooKeeper, http://zookeeper.apache.org/
Kryo - Fast, efficient Java serialization, http://code.google.com/p/kryo/
Twitter’s Open Source Storm Finally Hits, http://siliconangle.com/blog/2011/09/20/twitter-storm-finally-hits/

Download references

Author information

Authors and Affiliations

HP Labs Palo Alto, Hewlett Packard Co., California, USA
Qiming Chen & Meichun Hsu

Authors

Qiming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Meichun Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Austria
Josef Küng & Roland Wagner &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, Q., Hsu, M. (2013). Open Streaming Operation Patterns. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XII. Lecture Notes in Computer Science, vol 8320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45315-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-45315-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45314-4
Online ISBN: 978-3-642-45315-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics