Abstract
Like other emerging fields, Stream Processing Engines (SPEs) pose several challenges to the researchers such as resource awareness, dynamic configurations, heterogeneous clusters, and load balancing. All of these aspects play a major role in the job scheduling process. Inefficiency in any of them causes problems for achieving the maximum throughput. SPEs must contemplate other aspects like resource provisioning, job’s computation requirement, physical distance between communicating nodes, etc. Currently, SPEs ignore topology’s structure as well as inter-executor traffic while scheduling. Due to this, frequently communicating tasks may end up at different computing nodes which increases network latency. In this paper, A3-Storm, a scheduler, based on topology and traffic is proposed that optimizes resource usage for heterogeneous clusters. The aim is to improve efficiency using resource-aware task assignments that results in enhanced throughput and resource utilization. A3-Storm schedules topology using inter-executor traffic and supervisor node’s computing power. A3-Storm is divided into two phases: in the first phase, executors are logically grouped to minimize inter-group communication traffic according to the topology structure or inter-executor traffic. In the second phase, these groups are assigned to physical nodes starting from the most powerful node. Apache Storm (a popular open-source SPE) is used for the implementation of A3-Storm. Results are generated with the help of 2 benchmark topologies, and results are compared with 3 state-of-the-art algorithms. Extensive experiment results show up to 25% and 12% improvement in throughput as compared to the default Storm scheduler and resource-aware scheduler, respectively, with a significant amount of resource savings through consolidation.
This is a preview of subscription content, access via your institution.




























Notes
References
- 1.
Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19:171–209. https://doi.org/10.1007/s11036-013-0489-0
- 2.
Alotaibi S, Mehmood R, Katib I (2020) The role of big data and twitter data analytics in healthcare supply chain management. Springer, Cham, pp 267–279
- 3.
Casado R, Younas M (2015) Emerging trends and technologies in big data processing. Concurr Comput 27:2078–2091. https://doi.org/10.1002/cpe.3398
- 4.
Apache (2015) Storm. Apache storm. In: Apache. http://storm-project.net/. Accessed Mar 2020
- 5.
Apache Software Foundation (2014) S4 incubation status—Apache incubator. http://incubator.apache.org/projects/s4.html. Accessed Mar 2020
- 6.
Apache Software Foundation (2018) Apache Spark™—unified analytics engine for big data. In: Apache Spark. https://spark.apache.org/. Accessed Mar 2020
- 7.
Philip Chen CL, Zhang CY (2014) Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf Sci (NY) 275:314–347. https://doi.org/10.1016/j.ins.2014.01.015
- 8.
Apache Storm Scheduler. http://storm.apache.org/releases/2.0.0/Storm-Scheduler.html. Accessed 19 Mar 2020
- 9.
Aniello L, Baldoni R, Querzoni L (2013) Adaptive online scheduling in storm. In: DEBS 2013—Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems. ACM, pp 207–218
- 10.
Eskandari L, Mair J, Huang Z, Eyers D (2018) Poster: iterative scheduling for distributed stream processing systems. In: DEBS 2018—Proceedings of the 12th ACM International Conference on Distributed and Event-Based Systems. ACM Press, New York, New York, USA, pp 234–237
- 11.
Eskandari L, Mair J, Huang Z, Eyers D (2018) T3-Scheduler: a topology and Traffic aware two-level Scheduler for stream processing systems in a heterogeneous cluster. Futur Gener Comput Syst 89:617–632. https://doi.org/10.1016/j.future.2018.07.011
- 12.
Light J (2017) Energy usage profiling for green computing. In: Proceeding of the IEEE International Conference on Computing, Communication and Automation, ICCCA 2017 2017-Janua, pp 1287–1291. https://doi.org/10.1109/CCAA.2017.8230017
- 13.
Liu X, Buyya R (2018) D-Storm: dynamic resource-efficient scheduling of stream processing applications. In: Proceedings of the International Conference on Parallel Distributed System—ICPADS 2017-Decem, pp 485–492. https://doi.org/10.1109/ICPADS.2017.00070
- 14.
Fan J, Chen H, Hu F (2016) Adaptive task scheduling in storm. In: Proceedings of 2015 4th International Conference on Computer Science and Network Technology, ICCSNT 2015. IEEE, pp 309–314
- 15.
Peng B, Hosseini M, Hong Z, et al (2015) R-storm: resource-aware scheduling in storm. In: Middleware 2015—Proceedings of the 16th Annual Middleware Conference. ACM, pp 149–161
- 16.
Weng Z, Guo Q, Wang C et al (2017) AdaStorm: resource efficient storm with adaptive configuration. In: Proceedings of the International Conference on Data Engineering. IEEE, pp 1363–1364
- 17.
Li C, Zhang J (2017) Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of storm. J Netw Comput Appl 87:100–115. https://doi.org/10.1016/j.jnca.2017.03.007
- 18.
Eskandari L, Huang Z, Eyers D (2016) P-scheduler: adaptive hierarchical scheduling in Apache Storm. In: ACM International Conference Proceeding Series. ACM, pp 1–10
- 19.
Xu J, Chen Z, Tang J, Su S (2014) T-storm: traffic-aware online scheduling in storm. In: Proceedings of the International Conference on Distributed Computing Systems. IEEE, pp 535–544
- 20.
Apache Zookeeper (2016) Apache ZooKeeper—Home. https://zookeeper.apache.org/. Accessed Mar 2020
- 21.
Noll MG (2012) Understanding the parallelism of a storm topology what is storm ? http://storm.apache.org/releases/2.1.0/Understanding-the-parallelism-of-a-Storm-topology.html. Accessed Mar 2020
- 22.
Van Der Veen JS, Van Der Waaij B, Lazovik E et al (2015) Dynamically scaling apache storm for the analysis of streaming data. In: Proceedings of the 2015 IEEE 1st International Conference on Big Data Computing Service and Applications, BigDataService 2015. IEEE, pp 154–161
- 23.
Zhang J, Li C, Zhu L, Liu Y (2017) The real-time scheduling strategy based on traffic and load balancing in storm. In: Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2016. IEEE, pp 372–379
- 24.
Smirnov P, Melnik M, Nasonov D (2017) Performance-aware scheduling of streaming applications using genetic algorithm. Procedia Comput Sci 108:2240–2249. https://doi.org/10.1016/j.procs.2017.05.249
- 25.
Madsen KGS, Zhou Y (2015) Dynamic resource management in a Massively Parallel Stream Processing Engine. In: International Conference on Information and Knowledge Management, Proceedings. ACM, pp 13–22
- 26.
Khalid YN, Aleem M, Prodan R et al (2018) E-OSched: a load balancing scheduler for heterogeneous multicores. J Supercomput 74:5399–5431. https://doi.org/10.1007/s11227-018-2435-1
- 27.
Floating Point Operations Per Second (FLOPS) Definition. https://techterms.com/definition/flops. Accessed Mar 2020
- 28.
Storm A (2014) Isolation scheduler. In: GitHub. https://storm.incubator.apache.org/2013/01/11/storm082-released.html. Accessed Mar 2020
- 29.
Storm topology explained using word count topology example | CoreJavaGuru. http://www.corejavaguru.com/bigdata/storm/word-count-topology. Accessed 19 Mar 2020
- 30.
Exclamation topology. https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/ExclamationTopology.java. Accessed 19 Mar 2020
- 31.
Karanth S (2014) Mastering Hadoop. https://books.google.com.pk/books?id=IdEGBgAAQBAJ&pg=PT374&lpg=PT374&dq=Exclamation+Topology&source=bl&ots=lVJVSSfjyR&sig=ACfU3U3AEcjALh6n1V8XaQqheeZ0teZ_IQ&hl=en&sa=X&ved=2ahUKEwiolfDIyKHoAhUvSRUIHRyQAQwQ6AEwCXoECC8QAQ#v=onepage&q=ExclamationTopology&f=. Accessed Mar 2020
- 32.
Hussain A, Aleem M, Khan A et al (2018) RALBA: a computation-aware load balancing scheduler for cloud computing. Clust Comput 21:1667–1680. https://doi.org/10.1007/s10586-018-2414-6
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Muhammad, A., Aleem, M. A3-Storm: topology-, traffic-, and resource-aware storm scheduler for heterogeneous clusters. J Supercomput 77, 1059–1093 (2021). https://doi.org/10.1007/s11227-020-03289-9
Published:
Issue Date:
Keywords
- Traffic aware
- Topology aware
- Resource aware
- Storm scheduler
- Heterogeneous cluster
- Stream processing engine