Scalable Performance Modeling and Evaluation of MapReduce Applications

Karimian-Aliabadi, Soroush; Ardagna, Danilo; Entezari-Maleki, Reza; Movaghar, Ali

doi:10.1007/978-3-030-33495-6_34

Soroush Karimian-Aliabadi⁹,
Danilo Ardagna¹⁰,
Reza Entezari-Maleki¹¹ &
…
Ali Movaghar⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 891))

Included in the following conference series:

International Congress on High-Performance Computing and Big Data Analysis

Abstract

Big Data frameworks are becoming complex systems which have to cope with the increasing rate and diversity of data production in nowadays applications. This implies an increase in number of the variables and parameters to set in the framework for it to perform well. Therefor an accurate performance model is necessary to evaluate the execution time before actually executing the application. Two main and prominent Big Data frameworks are Hadoop and Spark, for which multiple performance models have been proposed in literature. Unfortunately, these models lack enough scalability to compete with the increasing size and complexity of the frameworks and of the underlying infrastructures used in production environments. In this paper we propose a scalable Lumped SRN model to predict execution time of multi-stage MapReduce and Spark applications, and validate the model against experiments on TPC-DS benchmark using the CINECA Italian super computing center. Results show that the proposed model enables analysis for multiple simultaneous jobs with multiple users and stages for each job in reasonable time and predicts execution time of an application with an average error about 14.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Reinsel, D., Gantz, J., Rydning, J.: Data age 2025: the evolution of data to life-critical (2017). https://www.seagate.com/de/de/our-story/data-age-2025/. Accessed July 2018
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492
Article Google Scholar
Apache, Apache Hadoop. http://hadoop.apache.org/. Accessed July 2018
Vavilapalli, V.K., et al.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC 2013, pp. 1–16. ACM Press, Santa Clara (2013). https://doi.org/10.1145/2523616.2523633
Saha, B., Shah, H., Seth, S., Vijayaraghavan, G., Murthy, A., Curino, C.: Apache Tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM International Conference on Management of Data, SIGMOD 2015, pp. 1357–1369. ACM Press, Melbourne (2015). https://doi.org/10.1145/2723372.2742790
Ardagna, D., et al.: Performance prediction of cloud-based big data applications. In: Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering, ICPE 2018, pp. 192–199. ACM Press, Berlin (2018). https://doi.org/10.1145/3184407.3184420
Requeno, J.I., Gascón, I., Merseguer, J.: Towards the performance analysis of Apache Tez applications. In: Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering, ICPE 2018, pp. 147–152. ACM Press, Berlin (2018). https://doi.org/10.1145/3185768.3186284
Ataie, E., Gianniti, E., Ardagna, D., Movaghar, A.: A combined analytical modeling machine learning approach for performance prediction of MapReduce jobs in cloud environment. In: Proceedings of the 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2016, pp. 431–439. IEEE, Timisoara (2016). https://doi.org/10.1109/SYNASC.2016.072
Zhang, Z., Cherkasova, L., Loo, B.T.: Benchmarking approach for designing a MapReduce performance model. In: Proceedings of the ACM/SPEC International Conference on International Conference on Performance Engineering, ICPE 2013, pp. 253–258. ACM Press, Prague (2013). https://doi.org/10.1145/2479871.2479906
Dai, J., Huang, J., Huang, S., Huang, B., Liu, Y.: HiTune: dataflow-based performance analysis for big data cloud. In: Proceedings of the USENIX Annual Technical Conference, pp. 87–100. USENIX Association, Portland (2011)
Google Scholar
Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The performance of MapReduce: an in-depth study. Proc. VLDB Endowment 3(1–2), 472–483 (2010). https://doi.org/10.14778/1920841.1920903
Article Google Scholar
Yigitbasi, N., Willke, T.L., Liao, G., Epema, D.: Towards machine learning-based auto-tuning of MapReduce. In: Proceedings of the IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 11–20. IEEE, San Francisco (2013). https://doi.org/10.1109/MASCOTS.2013.9
Venkataraman, S., Yang, Z., Franklin, M., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation, pp. 363–378. USENIX Association, Santa Clara (2016)
Google Scholar
Gibilisco, G.P., Li, M., Zhang, L., Ardagna, D.: Stage aware performance modeling of DAG based in memory analytic platforms. In: Proceedings of the 9th International Conference on Cloud Computing, CLOUD 2016, pp. 188–195. IEEE, San Francisco (2016). https://doi.org/10.1109/CLOUD.2016.0034
Liu, Y., Li, M., Alham, N.K., Hammoud, S.: HSim: a MapReduce simulator in enabling cloud computing. Future Gener. Comput. Syst. 29(1), 300–308 (2013). https://doi.org/10.1016/j.future.2011.05.007
Article Google Scholar
Gribaudo, M., Barbierato, E., Iacono, M.: Modeling apache hive based applications in big data architectures. In: Proceedings of the 7th International Conference on Performance Evaluation Methodologies and Tools, ValueTools 2013, pp. 30–38 ICST, Torino (2013). https://doi.org/10.4108/icst.valuetools.2013.254398
Ruiz, M.C., Calleja, J., Cazorla, D.: Petri nets formalization of Map/Reduce paradigm to optimise the performance-cost tradeoff. In: Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA, Vol. 3, pp. 92–99. IEEE, Helsinki (2015). https://doi.org/10.1109/Trustcom.2015.617
Ardagna, D., Bernardi, S., Gianniti, E., Karimian Aliabadi, S., Perez-Palacin, D., Requeno, J.I.: Modeling performance of hadoop applications: a journey from queueing networks to stochastic well formed nets. In: Carretero, J., Garcia-Blas, J., Ko, R.K.L., Mueller, P., Nakano, K. (eds.) ICA3PP 2016. LNCS, vol. 10048, pp. 599–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49583-5_47
Chapter Google Scholar
Malekimajd, M., Ardagna, D., Ciavotta, M., Rizzi, A.M., Passacantando, M.: Optimal map reduce job capacity allocation in cloud systems. ACM SIGMETRICS Perform. Eval. Rev. 42(4), 51–61 (2015). https://doi.org/10.1145/2788402.2788410
Article Google Scholar
Castiglione, A., Gribaudo, M., Iacono, M., Palmieri, F.: Exploiting mean field analysis to model performances of big data architectures. Future Gener. Comput. Syst. 37, 203–211 (2014). https://doi.org/10.1016/j.future.2013.07.016
Article Google Scholar
Gianniti, E., Rizzi, A.M., Barbierato, E., Gribaudo, M., Ardagna, D.: Fluid petri nets for the performance evaluation of MapReduce and spark applications. ACM SIGMETRICS Perform. Eval. Rev. 44(4), 23–36 (2017). https://doi.org/10.1145/3092819.3092824
Article Google Scholar
Spark, Apache Spark. http://spark.apache.org/. Accessed July 2018
Alipourfard, O., Harry Liu, H., Chen, J., Venkataraman, S., Yu, M., Zhang, M.: CherryPick: adaptively unearthing the best cloud configurations for big data analytics. In: Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2017), Boston, MA, USA, pp. 469–482 (2017)
Google Scholar
Teng, F., Yu, L., Magoulès, F.: SimMapReduce: a simulator for modeling MapReduce framework. In: Proceedings of the Fifth FTRA International Conference on Multimedia and Ubiquitous Engineering, pp. 277–282. IEEE, Loutraki (2011). https://doi.org/10.1109/MUE.2011.56
Lazowska, E.D., Zahorjan, J., Graham, G.S., Sevcik, K.C.: Quantitative System Performance: Computer System Analysis Using Queueing Network Models, 1st edn. Prentice-Hall, Upper Saddle River (1984)
Google Scholar
Ciardo, G., Trivedi, K.S.: A decomposition approach for stochastic reward net models. Perform. Eval. 18(1), 37–59 (1993). https://doi.org/10.1016/0166-5316(93)90026-Q. https://www.sciencedirect.com/science/article/pii/016653169390026Q
Article MathSciNet MATH Google Scholar
Ataie, E., Entezari-Maleki, R., Rashidi, L., Trivedi, K.S., Ardagna, D., Movaghar, A.: Hierarchical stochastic models for performance, availability, and power consumption analysis of IaaS clouds. IEEE Trans. Cloud Comput. (to appear). https://doi.org/10.1109/TCC.2017.2760836
Entezari-Maleki, R., Trivedi, K.S., Movaghar, A.: Performability evaluation of grid environments using stochastic reward nets. IEEE Trans. Dependable Secure Comput. 12(2), 204–216 (2015). https://doi.org/10.1109/TDSC.2014.2320741
Article Google Scholar
Meyer, J.F., Movaghar, A., Sanders, W.H.: Stochastic activity networks: structure, behavior, and application. In: Proceedings of the International Workshop on Timed Petri Nets, Torino, Italy, pp. 106–115 (1985)
Google Scholar
Reinecke, P., Bodrog, L., Danilkina, A.: Phase-type distributions. In: Wolter, K., Avritzer, A., Vieira, M., van Moorsel, A. (eds.) Resilience Assessment and Evaluation of Computing Systems, pp. 85–113. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-29032-9_5
Chapter Google Scholar
Flexiant: Flexiant cloud management software & cloud orchestration. https://www.flexiant.com/. Accessed July 2018
Cineca: Cineca computing center. http://www.cineca.it/. Accessed July 2018
Poess, M., Smith, B., Kollar, L., Larson, P.: TPC-DS, taking decision support benchmarking to the next level. In: Proceedings of the 2002 ACM International Conference on Management of data, SIGMOD 2002, pp. 582–587. ACM Press, Madison (2002). https://doi.org/10.1145/564691.564759
Hive: Apache Hive. https://hive.apache.org/. Accessed July 2018
Hirel, C., Tuffin, B., Trivedi, K.S.: SPNP: stochastic petri nets. Version 6.0. In: Haverkort, B.R., Bohnenkamp, H.C., Smith, C.U. (eds.) TOOLS 2000. LNCS, vol. 1786, pp. 354–357. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-46429-8_30
Chapter Google Scholar

Download references

Acknowledgment

The results of this work have been partially funded by the European DICE H2020 research project (grant agreement no. 644869).

Author information

Authors and Affiliations

Computer Engineering Department, Sharif University of Technology, Tehran, Iran
Soroush Karimian-Aliabadi & Ali Movaghar
Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
Danilo Ardagna
School of Computer Engineering, Iran University of Science and Technology (IUST), Tehran, Iran
Reza Entezari-Maleki

Authors

Soroush Karimian-Aliabadi
View author publications
You can also search for this author in PubMed Google Scholar
Danilo Ardagna
View author publications
You can also search for this author in PubMed Google Scholar
Reza Entezari-Maleki
View author publications
You can also search for this author in PubMed Google Scholar
Ali Movaghar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soroush Karimian-Aliabadi .

Editor information

Editors and Affiliations

University of Calabria, Rende, Italy
Lucio Grandinetti
Kharazmi University, Tehran, Iran
Seyedeh Leili Mirtaheri
University of Calabria, Rende, Italy
Reza Shahbazian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karimian-Aliabadi, S., Ardagna, D., Entezari-Maleki, R., Movaghar, A. (2019). Scalable Performance Modeling and Evaluation of MapReduce Applications. In: Grandinetti, L., Mirtaheri, S., Shahbazian, R. (eds) High-Performance Computing and Big Data Analysis. TopHPC 2019. Communications in Computer and Information Science, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-33495-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-33495-6_34
Published: 20 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33494-9
Online ISBN: 978-3-030-33495-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics