Journal of Computer Science and Technology

, Volume 33, Issue 6, pp 1125–1139 | Cite as

A Task Allocation Method for Stream Processing with Recovery Latency Constraint

  • Hong-Liang Li
  • Jie Wu
  • Zhen Jiang
  • Xiang Li
  • Xiao-Hui WeiEmail author
Regular Paper


Stream processing applications continuously process large amounts of online streaming data in real time or near real time. They have strict latency constraints. However, the continuous processing makes them vulnerable to any failures, and the recoveries may slow down the entire processing pipeline and break latency constraints. The upstream backup scheme is one of the most widely applied fault-tolerant schemes for stream processing systems. It introduces complex backup dependencies to tasks, which increases the difficulty of controlling recovery latencies. Moreover, when dependent tasks are located on the same processor, they fail at the same time in processor-level failures, bringing extra recovery latencies that increase the impacts of failures. This paper studies the relationship between the task allocation and the recovery latency of a stream processing application. We present a correlated failure effect model to describe the recovery latency of a stream topology in processor-level failures under a task allocation plan. We introduce a recovery-latency aware task allocation problem (RTAP) that seeks task allocation plans for stream topologies that will achieve guaranteed recovery latencies. We discuss the difference between RTAP and classic task allocation problems and present a heuristic algorithm with a computational complexity of O(n log2 n) to solve the problem. Extensive experiments were conducted to verify the correctness and effectiveness of our approach. It improves the resource usage by 15%–20% on average.


stream processing task allocation fault-tolerance upstream backup recovery latency 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2018_1876_MOESM1_ESM.pdf (182 kb)
ESM 1 (PDF 182 kb)


  1. [1]
    Stonebraker M, Çetintemel U, Zdonik S. The 8 requirements of real-time stream processing. ACM SIGMOD Record, 2005, 34(4): 42-47.CrossRefGoogle Scholar
  2. [2]
    Arasu A, Babcock B, Babu S, Datar M, Ito K, Nishizawa I et al. STREAM: The Stanford stream data manager (demonstration description). In Proc. ACM SIGMOD International Conference on Management of Data, June 2003, pp.665-665.Google Scholar
  3. [3]
    Hesse G, Lorenz M. Conceptual survey on data stream processing systems. In Proc. the 21st IEEE International Conference on Parallel and Distributed Systems, January 2015, pp.797-802.Google Scholar
  4. [4]
    Chandrasekaran S, Cooper O, Deshpande A et al. TelegraphCQ: Continuous dataflow processing. In Proc. the 2003 ACM SIGMOD International Conference on Management of Data, June 2003, pp.668-668.Google Scholar
  5. [5]
    Akidau T, Balikov A, Bekiroğlu K, Chernyak S et al. Mill-Wheel: Fault-tolerant stream processing at Internet scale. Proceedings of the VLDB Endowment, 2013, 6(11): 1033-1044.CrossRefGoogle Scholar
  6. [6]
    Toshniwal A, Taneja S, Shukla A, Ramasamy K et al. Storm@ Twitter. In Proc. ACM SIGMOD International Conference on Management of Data, June 2014, pp.147-156.Google Scholar
  7. [7]
    Neumeyer L, Robbins B, Nair A, Kesari A. S4: Distributed stream computing platform. In Proc. IEEE International Conference on Data Mining Workshops, Dec. 2010, pp.170-177.Google Scholar
  8. [8]
    Kulkarni S, Bhagat N, Fu M, Kedigehalli V et al. Twitter heron: Stream processing at scale. In Proc. ACM SIGMOD International Conference on Management of Data, May 2015, pp.239-250.Google Scholar
  9. [9]
    Zhao J, Ou S, Hu L, Ding Y, Xu G. A heuristic placement selection approach of partitions of mobile applications in mobile cloud computing model based on community collaboration. Cluster Computing, 2017, 20(4): 3131-3146.CrossRefGoogle Scholar
  10. [10]
    Eidenbenz R, Locher T. Task allocation for distributed stream processing. In Proc. the 35th Annual IEEE International Conference on Computer Communications, April 2016.Google Scholar
  11. [11]
    Hwang J, Balazinska M, Rasin A, Cetintemel U, Stonebraker M, Zdonik S. High-availability algorithms for distributed stream processing. In Proc. the 21st IEEE International Conference on Data Engineering, April 2005, pp.779-790.Google Scholar
  12. [12]
    Babcock B, Babu S, Motwani R, Datar M. Chain: Operator scheduling for memory minimization in data stream systems. In Proc. ACM SIGMOD International Conference on Management of Data, June 2003, pp.253-264.Google Scholar
  13. [13]
    Cardellini V, Grassi C, Presti L F, Nardelli M. Optimal operator placement for distributed stream processing applications. In Proc. the 10th ACM International Conference on Distributed and Event-Based Systems, June 2016, pp.69-80.Google Scholar
  14. [14]
    Chatzistergiou A, Viglas S D. Fast heuristics for near-optimal task allocation in data stream processing over clusters. In Proc. the 23rd ACM International Conference on Information and Knowledge Management, Nov. 2014, pp.1579-1588.Google Scholar
  15. [15]
    Lohrmann B, Janacik P, Kao O. Elastic stream processing with latency guarantees. In Proc. the 35th IEEE Distributed Computing Systems, June 2015, pp.399-410.Google Scholar
  16. [16]
    Li H, Wu J, Jiang Z, Li X, Wei X, Zhuang Y. Integrated recovery and task allocation for stream processing. In Proc. the 36th IEEE Performance Computing and Communications Conference, Dec. 2017.Google Scholar
  17. [17]
    Li H, Wu J, Jiang Z, Li X, Wei X. Minimum backups for stream processing with recovery latency guarantees. IEEE Transactions on Reliability, 2017, 66(3): 783-94.CrossRefGoogle Scholar
  18. [18]
    Sun D, Zhang G, Wu C, Li K, Zheng W. Building a fault tolerant framework with deadline guarantee in big data stream computing environments. Journal of Computer and System Sciences, 2017, 89: 4-23.MathSciNetCrossRefGoogle Scholar
  19. [19]
    Krempl G, Žliobaite I, Brzeziński D, Hüllermeier E et al. Open challenges for data stream mining research. ACM SIGKDD Explorations Newsletter, 2014, 16(1): 1-10.CrossRefGoogle Scholar
  20. [20]
    Li H, Wu J, Jiang Z, Li X, Wei X. Task allocation for stream processing with recovery latency guarantee. In Proc. IEEE International Conference on Cluster Computing, Sept. 2017, pp.379-383.Google Scholar
  21. [21]
    Ananthanarayanan R, Basker V, Das S, Gupta A, Jiang H, Qiu T, Reznichenko A, Ryabkov D, Singh M, Venkataraman S. Photon: Fault-tolerant and scalable joining of continuous data streams. In Proc. ACM SIGMOD International Conference on Management of Data, June 2013, pp.577-588.Google Scholar
  22. [22]
    Qian Z, He Y, Su C, Wu Z, Zhu H, Zhang T, Zhou L, Yu Y, Zhang Z. TimeStream: Reliable stream computation in the cloud. In Proc. the 8th ACM European Conference on Computer Systems, Apr. 2013, pp.1-14.Google Scholar
  23. [23]
    Su L, Zhou Y. Tolerating correlated failures in massively parallel stream processing engines. In Proc. the 32nd IEEE International Conference on Data Engineering, May 2016, pp.517-528.Google Scholar
  24. [24]
    Upadhyaya P, Kwon Y, Balazinska M. A latency and fault-tolerance optimizer for online parallel query plans. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2011, pp.241-252.Google Scholar
  25. [25]
    Fernandez C R, Migliavacca M, Kalyvianaki E, Pietzuch P. Integrating scale out and fault tolerance in stream processing using operator state management. In Proc. ACM SIGMOD International Conference on Management of Data, June 2013, pp.725-736.Google Scholar
  26. [26]
    Balazinska M, Balakrishnan H, Madden S R, Stonebraker M. Fault-tolerance in the Borealis distributed stream processing system. ACM Transactions on Database Systems, 2008, 33(1): Article No. 3.CrossRefGoogle Scholar
  27. [27]
    Heinze T, Zia M, Krahn R, Jerzak Z, Fetzer C. An adaptive replication scheme for elastic data stream processing systems. In Proc. the 9th ACM International Conference on Distributed Event-Based Systems, June 2015, pp.150-161.Google Scholar
  28. [28]
    Stanoi I, Mihaila G, Palpanas T, Lang C. WhiteWater: Distributed processing of fast streams. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(9): 1214-1226.CrossRefGoogle Scholar
  29. [29]
    de Matteis T, Mencagli G. Proactive elasticity and energy awareness in data stream processing. Journal of Systems and Software, 2017, 127: 302-319.CrossRefGoogle Scholar
  30. [30]
    Wu J. Distributed System Design. CRC Press, Inc., 2017.Google Scholar
  31. [31]
    Wu J, Huang K. The balanced hypercube: A cube-based system for fault-tolerant applications. IEEE Transactions on Computers, 1997, 46(4): 484-90.MathSciNetCrossRefGoogle Scholar
  32. [32]
    Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K. Apache Flink™: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015, 36(4): 28-38.Google Scholar
  33. [33]
    Zhuang Y, Wei X, Li H, Wang Y, He X. An optimal check-pointing model with online OCI adjustment for stream processing applications. In Proc. the 27th IEEE International Conference on Computer Communication and Networks, July 2018.Google Scholar
  34. [34]
    Salama A, Binnig C, Kraska T, Zamanian E. Cost-based fault-tolerance for parallel data processing. In Proc. ACM SIGMOD International Conference on Management of Data, May 2015, pp.285-297.Google Scholar
  35. [35]
    Young J W. A first order approximation to the optimum checkpoint interval. Communications of the ACM, 1974, 17(9): 530-531.CrossRefGoogle Scholar
  36. [36]
    Xu H, Xing L, Huang L. Regional science and technology resource allocation optimization based on improved genetic algorithm. KSII Transactions on Internet & Information Systems, 2017, 11(4): 1972-1986.Google Scholar
  37. [37]
    Jin Z, Xu G, Li Y, Liu P. A novel cloud scheduling algorithm optimization for energy consumption of data centres based on user QoS priori knowledge under the background of WSN and mobile communication. Cluster Computing, 2017, 20(2): 1587-1597.CrossRefGoogle Scholar
  38. [38]
    Christensen H I, Khan A, Pokutta S, Tetali P. Approximation and online algorithms for multidimensional bin packing: A survey. Computer Science Review, 2017, 24: 63-79.MathSciNetCrossRefGoogle Scholar
  39. [39]
    Lodi A, Martello S, Monaci M. Two-dimensional packing problems: A survey. European Journal of Operational Research, 2002, 141(2): 241-252.MathSciNetCrossRefGoogle Scholar
  40. [40]
    Coffman Jr E G, Csirik J, Galambos G, Martello S, Vigo D. Bin packing approximation algorithms: Survey and classification. In Handbook of Combinatorial Optimization, Pardalos P A, Du D Z, Graham R L (eds.), Springer, 2013, pp.455-531.Google Scholar
  41. [41]
    Johnson D S. Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences, 1974, 9(3): 256-278.MathSciNetCrossRefGoogle Scholar
  42. [42]
    Chung F R, Garey M R, Johnson D S. On packing two-dimensional bins. SIAM Journal on Algebraic Discrete Methods, 1982, 3(1): 66-76.MathSciNetCrossRefGoogle Scholar
  43. [43]
    Garey M R, Graham R L, Ullman J D. Worst-case analysis of memory allocation algorithms. In Proc. the 4th ACM Symposium on Theory of computing, May 1972, pp.143-150.Google Scholar
  44. [44]
    Jiang Y, Huang Z, Tsang D H. Towards max-min fair resource allocation for stream big data analytics in shared clouds. IEEE Transactions on Big Data, 2016, 4(1): 130-137.CrossRefGoogle Scholar
  45. [45]
    Kreps J, Narkhede N, Rao J. Kafka: A distributed messaging system for log processing. In Proc. the 6th Workshop on Networking Meets Databases, June 2011.Google Scholar
  46. [46]
    Sadykov R, Vanderbeck F. Bin packing with conflicts: A generic branch-and-price algorithm. INFORMS Journal on Computing, 2013, 25(2): 244-255.MathSciNetCrossRefGoogle Scholar
  47. [47]
    Koukoumidis E, Peh L S, Martonosi M R. SignalGuru: Leveraging mobile phones for collaborative traffic signal schedule advisory. In Proc. the 9th ACM International Conference on Mobile Systems, Applications, and Services, June 2011, pp.127-140.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Hong-Liang Li
    • 1
    • 2
    • 3
  • Jie Wu
    • 3
  • Zhen Jiang
    • 4
  • Xiang Li
    • 1
  • Xiao-Hui Wei
    • 1
    • 2
    Email author
  1. 1.College of Computer Science and TechnologyJilin UniversityChangchunChina
  2. 2.Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of EducationChangchunChina
  3. 3.Department of Computer and Information SciencesTemple UniversityPhiladelphiaU.S.A.
  4. 4.Department of Computer ScienceWest Chester University of PennsylvaniaWest ChesterU.S.A.

Personalised recommendations