Abstract
Many important big data applications require real-time processing of arriving data with high scalability, especially some IoT applications in where devices generate infinite data and environments are intrinsically volatile. Most of current Stream Processing Systems(SPS), like Storm or S4, often show an insufficient scalability as the architecture is based on static configurations. Although considerable research and industry effort has been invested on scale out of operators in SPS, most of them focus on how to scale out different type of operators based on an on-demand infrastructure. Few of them consider when and which operators should be scale out, as improper scale out may introduce extra overhead to the system. In this paper, we present a novel approach for finding bottleneck operator at run time and scale out only bottleneck operator. An algorithm is designed to find out bottleneck operator based on time utility function(TUF) model. The algorithm utilizes utility profit, utility penalty and utility threshold to evaluate the utility accrual of a run-time operator. With the rewarding of early completions and penalizing of missing deadline, the algorithm will scale out the operator when the utility accrual below the threshold. Experimental results show that our time-aware utility accrual approach can exactly identify and efficiently scale out the bottleneck operator at run time in data stream processing system.
This work is supported by National 863 Programme(No 2013AA01A212 ) ”Kernel Software and System for Intelligent Cloud Service and Management Platform”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Babcock, B., et al.: Models and issues in data stream systems. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM (2002)
Russell, M.A.: Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. O’Reilly Media, Inc. (2013)
Parikh, N., Sundaresan, N.: Scalable and near real-time burst detection from ecommerce queries. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2008)
Gulisano, V., et al.: Streamcloud: An elastic and scalable data streaming system. IEEE Transactions on Parallel and Distributed Systems 23(12), 2351–2365 (2012)
Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. The VLDB Journal the International Journal on Very Large Data Bases 12(2), 120–139 (2003)
Neumeyer, L., Robbing, B., et al.: S4: Distributed Stream Computing Platform. In: ICDMW (2010)
Castro Fernandez, R., et al.: Integrating scale out and fault tolerance in stream processing using operator state management. In: Proceedings of the 2013 International Conference on Management of Data. ACM (2013)
Kumar, V., Palaniswami, S.: Exploiting Resource Overloading Using Utility Accrual Approach for Parallel Data Processing in Cloud
Wu, H., Ravindran, B., Jensen, E.D.: Utility accrual scheduling under joint utility and resource constraints. In: 2004 Proceedings of the Seventh IEEE International Symposium on Object-Oriented Real-Time Distributed Computing. IEEE (2004)
Kuno, H.: Surveying the e-services technical landscape. In: Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems, WECWIS 2000. IEEE (2000)
Liu, S., Quan, G., Ren, S.: On-line scheduling of real-time services for cloud computing. In: 2010 6th World Congress on Services (SERVICES-1). IEEE (2010)
Stonebraker, M., Tintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. ACM SIGMOD Record 34(4), 42–47 (2005)
Lee, D., Kim, J.-S., Maeng, S.: Large-scale incremental processing with MapReduce. Future Generation Computer Systems (2013)
Yu, Y., et al.: Profit and penalty aware (pp-aware) scheduling for tasks with variable task execution time. In: Proceedings of the 2010 ACM Symposium on Applied Computing. ACM (2010)
Bulut, A., Singh, A.K.: A unified framework for monitoring data streams in real time. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005. IEEE (2005)
Bartal, Y., et al.: Multiprocessor scheduling with rejection. SIAM Journal on Discrete Mathematics 13(1), 64–78 (2000)
Zaharia, M., et al.: Discretized Streams: Fault-tolerant streaming computation at scale. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM (2013)
Gedik, B., et al.: Elastic scaling for data stream processing. IEEE Transactions on Parallel and Distributed Systems PP(99), 1 (2013)
Backman, N., Fonseca, R., Cetintemel, U.: Managing parallelism for stream processing in the cloud. In: Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing. ACM (2012)
Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. The VLDB Journal, 1–23 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Humayoo, M., Zhai, Y., He, Y., Xu, B., Wang, C. (2014). Operator Scale Out Using Time Utility Function in Big Data Stream Processing. In: Cai, Z., Wang, C., Cheng, S., Wang, H., Gao, H. (eds) Wireless Algorithms, Systems, and Applications. WASA 2014. Lecture Notes in Computer Science, vol 8491. Springer, Cham. https://doi.org/10.1007/978-3-319-07782-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-07782-6_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07781-9
Online ISBN: 978-3-319-07782-6
eBook Packages: Computer ScienceComputer Science (R0)