Skip to main content
Log in

A Task Allocation Method for Stream Processing with Recovery Latency Constraint

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Stream processing applications continuously process large amounts of online streaming data in real time or near real time. They have strict latency constraints. However, the continuous processing makes them vulnerable to any failures, and the recoveries may slow down the entire processing pipeline and break latency constraints. The upstream backup scheme is one of the most widely applied fault-tolerant schemes for stream processing systems. It introduces complex backup dependencies to tasks, which increases the difficulty of controlling recovery latencies. Moreover, when dependent tasks are located on the same processor, they fail at the same time in processor-level failures, bringing extra recovery latencies that increase the impacts of failures. This paper studies the relationship between the task allocation and the recovery latency of a stream processing application. We present a correlated failure effect model to describe the recovery latency of a stream topology in processor-level failures under a task allocation plan. We introduce a recovery-latency aware task allocation problem (RTAP) that seeks task allocation plans for stream topologies that will achieve guaranteed recovery latencies. We discuss the difference between RTAP and classic task allocation problems and present a heuristic algorithm with a computational complexity of O(n log2 n) to solve the problem. Extensive experiments were conducted to verify the correctness and effectiveness of our approach. It improves the resource usage by 15%–20% on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Stonebraker M, Çetintemel U, Zdonik S. The 8 requirements of real-time stream processing. ACM SIGMOD Record, 2005, 34(4): 42-47.

    Article  Google Scholar 

  2. Arasu A, Babcock B, Babu S, Datar M, Ito K, Nishizawa I et al. STREAM: The Stanford stream data manager (demonstration description). In Proc. ACM SIGMOD International Conference on Management of Data, June 2003, pp.665-665.

  3. Hesse G, Lorenz M. Conceptual survey on data stream processing systems. In Proc. the 21st IEEE International Conference on Parallel and Distributed Systems, January 2015, pp.797-802.

  4. Chandrasekaran S, Cooper O, Deshpande A et al. TelegraphCQ: Continuous dataflow processing. In Proc. the 2003 ACM SIGMOD International Conference on Management of Data, June 2003, pp.668-668.

  5. Akidau T, Balikov A, Bekiroğlu K, Chernyak S et al. Mill-Wheel: Fault-tolerant stream processing at Internet scale. Proceedings of the VLDB Endowment, 2013, 6(11): 1033-1044.

    Article  Google Scholar 

  6. Toshniwal A, Taneja S, Shukla A, Ramasamy K et al. Storm@ Twitter. In Proc. ACM SIGMOD International Conference on Management of Data, June 2014, pp.147-156.

  7. Neumeyer L, Robbins B, Nair A, Kesari A. S4: Distributed stream computing platform. In Proc. IEEE International Conference on Data Mining Workshops, Dec. 2010, pp.170-177.

  8. Kulkarni S, Bhagat N, Fu M, Kedigehalli V et al. Twitter heron: Stream processing at scale. In Proc. ACM SIGMOD International Conference on Management of Data, May 2015, pp.239-250.

  9. Zhao J, Ou S, Hu L, Ding Y, Xu G. A heuristic placement selection approach of partitions of mobile applications in mobile cloud computing model based on community collaboration. Cluster Computing, 2017, 20(4): 3131-3146.

    Article  Google Scholar 

  10. Eidenbenz R, Locher T. Task allocation for distributed stream processing. In Proc. the 35th Annual IEEE International Conference on Computer Communications, April 2016.

  11. Hwang J, Balazinska M, Rasin A, Cetintemel U, Stonebraker M, Zdonik S. High-availability algorithms for distributed stream processing. In Proc. the 21st IEEE International Conference on Data Engineering, April 2005, pp.779-790.

  12. Babcock B, Babu S, Motwani R, Datar M. Chain: Operator scheduling for memory minimization in data stream systems. In Proc. ACM SIGMOD International Conference on Management of Data, June 2003, pp.253-264.

  13. Cardellini V, Grassi C, Presti L F, Nardelli M. Optimal operator placement for distributed stream processing applications. In Proc. the 10th ACM International Conference on Distributed and Event-Based Systems, June 2016, pp.69-80.

  14. Chatzistergiou A, Viglas S D. Fast heuristics for near-optimal task allocation in data stream processing over clusters. In Proc. the 23rd ACM International Conference on Information and Knowledge Management, Nov. 2014, pp.1579-1588.

  15. Lohrmann B, Janacik P, Kao O. Elastic stream processing with latency guarantees. In Proc. the 35th IEEE Distributed Computing Systems, June 2015, pp.399-410.

  16. Li H, Wu J, Jiang Z, Li X, Wei X, Zhuang Y. Integrated recovery and task allocation for stream processing. In Proc. the 36th IEEE Performance Computing and Communications Conference, Dec. 2017.

  17. Li H, Wu J, Jiang Z, Li X, Wei X. Minimum backups for stream processing with recovery latency guarantees. IEEE Transactions on Reliability, 2017, 66(3): 783-94.

    Article  Google Scholar 

  18. Sun D, Zhang G, Wu C, Li K, Zheng W. Building a fault tolerant framework with deadline guarantee in big data stream computing environments. Journal of Computer and System Sciences, 2017, 89: 4-23.

    Article  MathSciNet  Google Scholar 

  19. Krempl G, Žliobaite I, Brzeziński D, Hüllermeier E et al. Open challenges for data stream mining research. ACM SIGKDD Explorations Newsletter, 2014, 16(1): 1-10.

    Article  Google Scholar 

  20. Li H, Wu J, Jiang Z, Li X, Wei X. Task allocation for stream processing with recovery latency guarantee. In Proc. IEEE International Conference on Cluster Computing, Sept. 2017, pp.379-383.

  21. Ananthanarayanan R, Basker V, Das S, Gupta A, Jiang H, Qiu T, Reznichenko A, Ryabkov D, Singh M, Venkataraman S. Photon: Fault-tolerant and scalable joining of continuous data streams. In Proc. ACM SIGMOD International Conference on Management of Data, June 2013, pp.577-588.

  22. Qian Z, He Y, Su C, Wu Z, Zhu H, Zhang T, Zhou L, Yu Y, Zhang Z. TimeStream: Reliable stream computation in the cloud. In Proc. the 8th ACM European Conference on Computer Systems, Apr. 2013, pp.1-14.

  23. Su L, Zhou Y. Tolerating correlated failures in massively parallel stream processing engines. In Proc. the 32nd IEEE International Conference on Data Engineering, May 2016, pp.517-528.

  24. Upadhyaya P, Kwon Y, Balazinska M. A latency and fault-tolerance optimizer for online parallel query plans. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2011, pp.241-252.

  25. Fernandez C R, Migliavacca M, Kalyvianaki E, Pietzuch P. Integrating scale out and fault tolerance in stream processing using operator state management. In Proc. ACM SIGMOD International Conference on Management of Data, June 2013, pp.725-736.

  26. Balazinska M, Balakrishnan H, Madden S R, Stonebraker M. Fault-tolerance in the Borealis distributed stream processing system. ACM Transactions on Database Systems, 2008, 33(1): Article No. 3.

    Article  Google Scholar 

  27. Heinze T, Zia M, Krahn R, Jerzak Z, Fetzer C. An adaptive replication scheme for elastic data stream processing systems. In Proc. the 9th ACM International Conference on Distributed Event-Based Systems, June 2015, pp.150-161.

  28. Stanoi I, Mihaila G, Palpanas T, Lang C. WhiteWater: Distributed processing of fast streams. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(9): 1214-1226.

    Article  Google Scholar 

  29. de Matteis T, Mencagli G. Proactive elasticity and energy awareness in data stream processing. Journal of Systems and Software, 2017, 127: 302-319.

    Article  Google Scholar 

  30. Wu J. Distributed System Design. CRC Press, Inc., 2017.

  31. Wu J, Huang K. The balanced hypercube: A cube-based system for fault-tolerant applications. IEEE Transactions on Computers, 1997, 46(4): 484-90.

    Article  MathSciNet  Google Scholar 

  32. Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K. Apache Flink™: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015, 36(4): 28-38.

    Google Scholar 

  33. Zhuang Y, Wei X, Li H, Wang Y, He X. An optimal check-pointing model with online OCI adjustment for stream processing applications. In Proc. the 27th IEEE International Conference on Computer Communication and Networks, July 2018.

  34. Salama A, Binnig C, Kraska T, Zamanian E. Cost-based fault-tolerance for parallel data processing. In Proc. ACM SIGMOD International Conference on Management of Data, May 2015, pp.285-297.

  35. Young J W. A first order approximation to the optimum checkpoint interval. Communications of the ACM, 1974, 17(9): 530-531.

    Article  Google Scholar 

  36. Xu H, Xing L, Huang L. Regional science and technology resource allocation optimization based on improved genetic algorithm. KSII Transactions on Internet & Information Systems, 2017, 11(4): 1972-1986.

    Google Scholar 

  37. Jin Z, Xu G, Li Y, Liu P. A novel cloud scheduling algorithm optimization for energy consumption of data centres based on user QoS priori knowledge under the background of WSN and mobile communication. Cluster Computing, 2017, 20(2): 1587-1597.

    Article  Google Scholar 

  38. Christensen H I, Khan A, Pokutta S, Tetali P. Approximation and online algorithms for multidimensional bin packing: A survey. Computer Science Review, 2017, 24: 63-79.

    Article  MathSciNet  Google Scholar 

  39. Lodi A, Martello S, Monaci M. Two-dimensional packing problems: A survey. European Journal of Operational Research, 2002, 141(2): 241-252.

    Article  MathSciNet  Google Scholar 

  40. Coffman Jr E G, Csirik J, Galambos G, Martello S, Vigo D. Bin packing approximation algorithms: Survey and classification. In Handbook of Combinatorial Optimization, Pardalos P A, Du D Z, Graham R L (eds.), Springer, 2013, pp.455-531.

  41. Johnson D S. Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences, 1974, 9(3): 256-278.

    Article  MathSciNet  Google Scholar 

  42. Chung F R, Garey M R, Johnson D S. On packing two-dimensional bins. SIAM Journal on Algebraic Discrete Methods, 1982, 3(1): 66-76.

    Article  MathSciNet  Google Scholar 

  43. Garey M R, Graham R L, Ullman J D. Worst-case analysis of memory allocation algorithms. In Proc. the 4th ACM Symposium on Theory of computing, May 1972, pp.143-150.

  44. Jiang Y, Huang Z, Tsang D H. Towards max-min fair resource allocation for stream big data analytics in shared clouds. IEEE Transactions on Big Data, 2016, 4(1): 130-137.

    Article  Google Scholar 

  45. Kreps J, Narkhede N, Rao J. Kafka: A distributed messaging system for log processing. In Proc. the 6th Workshop on Networking Meets Databases, June 2011.

  46. Sadykov R, Vanderbeck F. Bin packing with conflicts: A generic branch-and-price algorithm. INFORMS Journal on Computing, 2013, 25(2): 244-255.

    Article  MathSciNet  Google Scholar 

  47. Koukoumidis E, Peh L S, Martonosi M R. SignalGuru: Leveraging mobile phones for collaborative traffic signal schedule advisory. In Proc. the 9th ACM International Conference on Mobile Systems, Applications, and Services, June 2011, pp.127-140.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Hui Wei.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 182 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, HL., Wu, J., Jiang, Z. et al. A Task Allocation Method for Stream Processing with Recovery Latency Constraint. J. Comput. Sci. Technol. 33, 1125–1139 (2018). https://doi.org/10.1007/s11390-018-1876-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-018-1876-6

Keywords

Navigation