Abstract
Apache Hadoop is an open source software framework that supports data-intensive distributed applications and is distributed under the Apache 2.0 licensing agreement, where consumers will no longer deal with complex configuration of software and hardware but only pay for cloud services on demand. So how to make the performance of the cloud platform become more important in a consumer-centric environment. There exists imbalance between in some distribution of slow tasks, which results in straggling tasks will have a great influence on the Hadoop framework. By monitoring those tasks in real-time progress and copying the potential Stragglers to a different node, the speculative execution (SE) realizes to improve the probability of finishing those backup tasks before the original ones. The Speculative execution (SE) applies this principle and thus proposed a solution to handle the Straggling tasks. At present, the performance of the Hadoop system is unsatisfying because of the erroneous judgement and inappropriate selection for the backup nodes in the current SE policy. This paper proposes an SE optimized strategy which can be used in prediction of near data. In this strategy, the first step is gathering the real-time task execution information and the remaining runtime required for the task is predicted by a local prediction method. Then it chooses a proper backup node according to the near data and actual demand in the second step. On the other side, this model also includes a cost-effective model in order to make the performance of SE to the peak. The results show that using this strategy in Hadoop effectively improves the accuracy of alternative tasks and effects better in heterogeneous Hadoop environments in various situations, which is beneficial to consumers and cloud platform.
M. Sun and X. Wu—Both authors are the first author due to equal contribution to this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vaquero, L.F., Rodero, L., Caceres, J.: A break in the clouds: towards a cloud definition. Acm Sigcomm Comput. Commun. Rev. 39(1), 50–55 (2008)
Iqbal, M.H., Soomro, T.R.: Big data analysis: apache storm perspective. Int. J. Comput. Trends Technol. 19(1), 9–14 (2015)
Zaharia, M., Chowdhury, M., Franklin, M.J.: Spark: cluster computing with working sets. In: Proceedings USENIX Conference on Hot Topics in Cloud Computing, pp. 1765–1773. Springer, Heidelberg (2010)
Li, Z., Shen, H., Ligon, W.: An exploration of designing a hybrid scale-up/out hadoop architecture based on performance measurements. IEEE Trans. Parallel Distrib. Syst 28(2), 386–400 (2017)
Gunarathne, T., Wu, T.L., Qiu, J.: MapReduce in the clouds for science. In: Proceedings Second International Conference on Cloud computing, pp. 565–572 (2010)
Dean, J., Ghemawa, S.: MapReduce: simplified data processing on large clusters. In: Proceedings OSDI, pp. 107–113 (2004)
Liu, Q., Cai, W., Jin, D.: Estimation accuracy on execution time of run-time tasks in a heterogeneous distributed environment. Sensors 16(9), 1386 (2016)
Xu, H., Lau, W.C.: Optimization for speculative execution in big data processing clusters. IEEE Trans. Parallel Distrib. Syst. 28(2), 530–545 (2017)
Xu, H., Lau, W.C.: Optimization for speculative execution in a mapreduce-like cluster. In: Proceedings IEEE Conference on Computer Communications (INFOCOM), pp. 1071–1079 (2015)
Sanchez, R., Almenares, F., Arias, P.: Enhancing privacy and dynamic federation in IdM for consumer cloud computing. IEEE Trans. Consum. Electron. 58(1), 95–103 (2012)
Cabarcos, P.A., Mendoza, F.A., Guerrero, R.S.: SuSSo: seamless and ubiquitous single sign-on for cloud service continuity across devices. IEEE Trans. Consum. Electron. 58(4), 1425–1433 (2012)
Abolfazli, S., Sanaei, Z., Alizadeh, M.: An experimental analysis on cloud-based mobile augmentation in mobile cloud computing. IEEE Trans. Consum. Electron. 58(1), 146–154 (2014)
Fu, Z., Sun, X., Linge, N.: Achieving effective cloud search services: multi-keyword ranked search over encrypted cloud data supporting synonym query. IEEE Trans. Consum. Electron. 60(1), 164–172 (2014)
Eom, B., Lee, C., Lee, H.: An adaptive remote display scheme to deliver mobile cloud services. IEEE Trans. Consum. Electron. 60(3), 540–547 (2014)
Xu, X., Xue, Y., Yuan, Y.: An edge computing-enabled computation offloading method with privacy preservation for internet of connected vehicles. Fut. Gener. Comput. Syst. 96(1), 89–100 (2019)
Lee, Y.: An integrated cloud-based smart home management system with community hierarchy. IEEE Trans. Consum. Electron. 62(1), 1–9 (2016)
Liu, Q., Cai, W., Shen, J.: A speculative approach to spatial-temporal efficiency with multi-objective optimization in a heterogeneous cloud environment. Secur. Commun. Netw. 7(17), 4002–4012 (2016)
Liu, Q., Cai, W., Shen, J.: An adaptive approach to better load balancing in a consumer-centric cloud environment. IEEE Trans. Consum. Electron. 62(3), 243–250 (2016)
Huang, X., Zhang, L., Li, R.: Novel heuristic speculative execution strategies in heterogeneous distributed environments. Comput. Electric. Eng. 50, 166–179 (2015)
Chen, Q., Liu, C., Xiao, Z.: Improving MapReduce performance using smart speculative execution strategy. IEEE Trans. Comput. 63(4), 954–967 (2014)
Wu, H., Li, K., Tang, Z.: A Heuristic speculative execution strategy in heterogeneous distributed environments. In: Proceedings Sixth International symposium on Parallel Architectures, Algorithms and Programming (PAAP), pp. 268–273 (2014)
Liu, Q., Cai, W., Shen, J.: A smart strategy for speculative execution based on hardware resource in a heterogeneous distributed environment. Int. J. Grid Distrib. Comput. 9(1), 203–214 (2015)
Wang, Y., Lu, W., Lou, R.: Improving MapReduce performance with partial speculative execution. J. Grid Comput. 13(1), 587–604 (2015)
Li, Y., Yang, Q., Lai, S.: A new speculative execution algorithm based on C4.5 decision tree for hadoop. In: Proceedings the International Conference of Young Computer Scientists, Engineers and Educators (ICYCSEE 2015), pp. 284–291 (2015)
Tang, S., Lee, B., He, B.: DynamicMR: a dynamic slot allocation optimization framework for MapReduce clusters. IEEE Trans. Cloud Comput. 2(3), 333–347 (2014)
Yang, S., Chen, Y.: Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds. J. Netw. Comput. Appl. 57(1), 61–70 (2015)
Liu, Q., Chen, F., Chen, F.: Home appliances classification based on multi-feature using ELM. Int. J. Sensor Netw. 28(1), 34–42 (2018)
Xu, X., Li, Y., Huang, T.: An energy-aware computation offloading method for smart edge computing in wireless metropolitan area networks. J. Netw. Comput. Appl. 133(1), 75–85 (2019)
Acknowledgement
This work has received funding from 5150 Spring Specialists (05492018012, 05762018039), Major Program of the National Social Science Fund of China (Grant No. 17ZDA092), 333 High-Level Talent Cultivation Project of Jiangsu Province (BRA2018332), Royal Society of Edinburgh, UK and China Natural Science Foundation Council (RSE Reference: 62967_Liu_2018_2) under their Joint International Projects funding scheme and basic Research Programs (Natural Science Foundation) of Jiangsu Province (BK20191398).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Sun, M., Wu, X., Jin, D., Xu, X., Liu, Q., Liu, X. (2020). Near-Data Prediction Based Speculative Optimization in a Distribution Environment. In: Zhang, X., Liu, G., Qiu, M., Xiang, W., Huang, T. (eds) Cloud Computing, Smart Grid and Innovative Frontiers in Telecommunications. CloudComp SmartGift 2019 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 322. Springer, Cham. https://doi.org/10.1007/978-3-030-48513-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-48513-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48512-2
Online ISBN: 978-3-030-48513-9
eBook Packages: Computer ScienceComputer Science (R0)