Skip to main content

Operator Scale Out Using Time Utility Function in Big Data Stream Processing

  • Conference paper
Wireless Algorithms, Systems, and Applications (WASA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8491))

  • 2131 Accesses

Abstract

Many important big data applications require real-time processing of arriving data with high scalability, especially some IoT applications in where devices generate infinite data and environments are intrinsically volatile. Most of current Stream Processing Systems(SPS), like Storm or S4, often show an insufficient scalability as the architecture is based on static configurations. Although considerable research and industry effort has been invested on scale out of operators in SPS, most of them focus on how to scale out different type of operators based on an on-demand infrastructure. Few of them consider when and which operators should be scale out, as improper scale out may introduce extra overhead to the system. In this paper, we present a novel approach for finding bottleneck operator at run time and scale out only bottleneck operator. An algorithm is designed to find out bottleneck operator based on time utility function(TUF) model. The algorithm utilizes utility profit, utility penalty and utility threshold to evaluate the utility accrual of a run-time operator. With the rewarding of early completions and penalizing of missing deadline, the algorithm will scale out the operator when the utility accrual below the threshold. Experimental results show that our time-aware utility accrual approach can exactly identify and efficiently scale out the bottleneck operator at run time in data stream processing system.

This work is supported by National 863 Programme(No 2013AA01A212 ) ”Kernel Software and System for Intelligent Cloud Service and Management Platform”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://hadoop.apache.org

  2. http://storm.incubator.apache.org

  3. Babcock, B., et al.: Models and issues in data stream systems. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM (2002)

    Google Scholar 

  4. Russell, M.A.: Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. O’Reilly Media, Inc. (2013)

    Google Scholar 

  5. Parikh, N., Sundaresan, N.: Scalable and near real-time burst detection from ecommerce queries. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2008)

    Google Scholar 

  6. Gulisano, V., et al.: Streamcloud: An elastic and scalable data streaming system. IEEE Transactions on Parallel and Distributed Systems 23(12), 2351–2365 (2012)

    Article  Google Scholar 

  7. Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. The VLDB Journal the International Journal on Very Large Data Bases 12(2), 120–139 (2003)

    Article  Google Scholar 

  8. Neumeyer, L., Robbing, B., et al.: S4: Distributed Stream Computing Platform. In: ICDMW (2010)

    Google Scholar 

  9. Castro Fernandez, R., et al.: Integrating scale out and fault tolerance in stream processing using operator state management. In: Proceedings of the 2013 International Conference on Management of Data. ACM (2013)

    Google Scholar 

  10. Kumar, V., Palaniswami, S.: Exploiting Resource Overloading Using Utility Accrual Approach for Parallel Data Processing in Cloud

    Google Scholar 

  11. Wu, H., Ravindran, B., Jensen, E.D.: Utility accrual scheduling under joint utility and resource constraints. In: 2004 Proceedings of the Seventh IEEE International Symposium on Object-Oriented Real-Time Distributed Computing. IEEE (2004)

    Google Scholar 

  12. Kuno, H.: Surveying the e-services technical landscape. In: Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems, WECWIS 2000. IEEE (2000)

    Google Scholar 

  13. Liu, S., Quan, G., Ren, S.: On-line scheduling of real-time services for cloud computing. In: 2010 6th World Congress on Services (SERVICES-1). IEEE (2010)

    Google Scholar 

  14. Stonebraker, M., Tintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. ACM SIGMOD Record 34(4), 42–47 (2005)

    Article  Google Scholar 

  15. Lee, D., Kim, J.-S., Maeng, S.: Large-scale incremental processing with MapReduce. Future Generation Computer Systems (2013)

    Google Scholar 

  16. Yu, Y., et al.: Profit and penalty aware (pp-aware) scheduling for tasks with variable task execution time. In: Proceedings of the 2010 ACM Symposium on Applied Computing. ACM (2010)

    Google Scholar 

  17. Bulut, A., Singh, A.K.: A unified framework for monitoring data streams in real time. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005. IEEE (2005)

    Google Scholar 

  18. Bartal, Y., et al.: Multiprocessor scheduling with rejection. SIAM Journal on Discrete Mathematics 13(1), 64–78 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  19. Zaharia, M., et al.: Discretized Streams: Fault-tolerant streaming computation at scale. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM (2013)

    Google Scholar 

  20. Gedik, B., et al.: Elastic scaling for data stream processing. IEEE Transactions on Parallel and Distributed Systems PP(99), 1 (2013)

    Google Scholar 

  21. Backman, N., Fonseca, R., Cetintemel, U.: Managing parallelism for stream processing in the cloud. In: Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing. ACM (2012)

    Google Scholar 

  22. Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. The VLDB Journal, 1–23 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Humayoo, M., Zhai, Y., He, Y., Xu, B., Wang, C. (2014). Operator Scale Out Using Time Utility Function in Big Data Stream Processing. In: Cai, Z., Wang, C., Cheng, S., Wang, H., Gao, H. (eds) Wireless Algorithms, Systems, and Applications. WASA 2014. Lecture Notes in Computer Science, vol 8491. Springer, Cham. https://doi.org/10.1007/978-3-319-07782-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07782-6_6

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07781-9

  • Online ISBN: 978-3-319-07782-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics