Abstract
In order to accelerate Hadoop performance, it is important to efficiently handle the data and job placement. More specifically, we focus on to accelerate the performance of heterogeneous distributed cluster as Hadoop default has limited performance outcome for data-intensive jobs. To improve the Hadoop performance, it is important to consider the heterogeneity of nodes, reduce job latency, and improve the data locality of blocks. In this research, we use block rearrangement policy which can rearrange the data blocks considering node’s processing capability or heterogeneity of node for data placement and effectively use node labeling and scheduling schemes for job placement to meet the goal. The experimental result shows that the proposed model accelerates the Hadoop performance by achieving high data locality and less job completion time compared to default configuration and policy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Apache Hadoop. http://hadoop.apache.org
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10), IEEE (2010)
Vavilapalli, V.K., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E., Murthy, A., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.: Apache Hadoop YARN. In: Proceedings of the 4th Annual Symposium on Cloud Computing—SOCC ’13 (2013)
Dean, J., Ghemawat, S.: MapReduce. Commun. ACM 51, 107 (2008)
Shah, A., Padole, M.: Load balancing through block rearrangement policy for Hadoop heterogeneous cluster. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE (2018)
Muthukkaruppan, K., Ranganathan, K., Tang, L.: U.S. Patent No. 9,268,808. U.S. Patent and Trademark Office, Washington, DC (2016)
Qureshi, F., Muhammad, N., Shin, D.R.: RDP: a storage-tier-aware robust data placement strategy for Hadoop in a cloud-based heterogeneous environment. KSII Trans. Internet Inf. Syst. 10(9) (2016)
Meng, L., Zhao, W., Zhao, H., Ding, Y.: A network load sensitive block placement strategy of HDFS. KSII Trans. Internet Inf. Syst. 9(9) (2015)
Dai, W., Ibrahim, I., Bassiouni, M.: An improved replica placement policy for Hadoop distributed file system running on cloud platforms. In: 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), pp. 270–275), IEEE (2017)
Fahmy, M.M., Elghandour, I., Nagi, M.: CoS-HDFS: co-locating geo-distributed spatial data in Hadoop distributed file system. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp. 123–132, ACM (2016)
Park, D., Kang, K., Hong, J., Cho, Y.: An efficient Hadoop data replication method design for heterogeneous clusters. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 2182–2184, ACM (2016)
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A Self-tuning System for Big Data Analytics (2011)
Grid5000. https://www.grid5000.fr/w/Grid5000:Home
Shah, A., Padole, M.: Performance analysis of scheduling algorithms in Apache Hadoop. In: Data, Engineering and Applications. Springer, Singapore (2019)
Apache Hadoop 2.7.2—HDFS Architecture. https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#aMoving_Computation_is_Cheaper_than_Moving_Data
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shah, A., Padole, M. (2020). Hadoop Performance Acceleration by Effective Data and Job Placement. In: Reddy, V., Prasad, V., Wang, J., Reddy, K. (eds) Soft Computing and Signal Processing. ICSCSP 2019. Advances in Intelligent Systems and Computing, vol 1118. Springer, Singapore. https://doi.org/10.1007/978-981-15-2475-2_20
Download citation
DOI: https://doi.org/10.1007/978-981-15-2475-2_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2474-5
Online ISBN: 978-981-15-2475-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)