Advertisement

Telecommunication Systems

, Volume 70, Issue 1, pp 13–25 | Cite as

Designing a Hadoop system based on computational resources and network delay for wide area networks

  • Tomohiro Matsuno
  • Bijoy Chand Chatterjee
  • Nattapong KitsuwanEmail author
  • Eiji Oki
  • Malathi Veeraraghavan
  • Satoru Okamoto
  • Naoaki Yamanaka
Article
  • 90 Downloads

Abstract

This paper proposes a Hadoop system that considers both slave server’s processing capacity and network delay for wide area networks to reduce the job processing time. The task allocation scheme in the proposed Hadoop system divides each individual job into multiple tasks using suitable splitting ratios and then allocates the tasks to different slaves according to the computational capability of each server and the availability of network resources. We incorporate software-defined networking to the proposed Hadoop system to manage path computation elements and network resources. The performance of proposed Hadoop system is experimentally evaluated with fourteen machines located in the different parts of the globe using a scale-out approach. A scale-out experiment using the proposed and conventional Hadoop systems is conducted by executing both single job and multiple jobs. The practical testbed and simulation results indicate that the proposed Hadoop system is effective compared to the conventional Hadoop system in terms of processing time.

Keywords

Hadoop Heterogeneous clusters Jobtracker Implementation 

References

  1. 1.
    Manikandan, S., & Ravi, S. (2014). Big data analysis using apache hadoop. In International conference on IT convergence and security (ICITCS) (pp. 1–4).Google Scholar
  2. 2.
    Dong, F., & Akl, S. G. (2006). Scheduling algorithms for grid computing: State of the art and open problems. Report: Technical.Google Scholar
  3. 3.
  4. 4.
    Adnan M., Afzal M., Aslam M., Jan R., & Martinez-Enriquez A. (2014). Minimizing big data problems using cloud computing based on hadoop architecture. In 11th annual high-capacity optical networks and emerging/enabling technologies (HONET) (pp. 99–103).Google Scholar
  5. 5.
    Cloudera Impala Project. http://impala.io/.
  6. 6.
    Cao, Z., Lin, J., Wan, C., Song, Y., Taylor, G., & Li, M. (2017). Hadoop-based framework for big data analysis of synchronised harmonics in active distribution network. IET Generation, Transmission & Distribution, 11(16), 3930–3937.  https://doi.org/10.1049/iet-gtd.2016.1723.CrossRefGoogle Scholar
  7. 7.
    White, T. (2012). Hadoop: The definitive guide (3rd ed.). Newton: O’Reilly Media Inc.Google Scholar
  8. 8.
    Martin, B. (2014). SARAH-statistical analysis for resource allocation in hadoop. In IEEE 13th international conference on trust, security and privacy in computing and communications (TrustCom) (pp. 777–782).Google Scholar
  9. 9.
    Chen, D., Chen, Y., Brownlow, B. N., Kanjamala, P. P., Arredondo, C. A. G., Radspinner, B. L., et al. (2017). Real-time or near real-time persisting daily healthcare data into HDFS and elasticsearch index inside a big data platform. IEEE Transactions on Industrial Informatics, 13(2), 595–606.  https://doi.org/10.1109/TII.2016.2645606.CrossRefGoogle Scholar
  10. 10.
    Palanisamy, B., Singh, A., & Liu, L. (2014). Cost-effective resource provisioning for mapreduce in a cloud. IEEE Transactions on Parallel and Distributed Systems, 26(5), 1265–1279.  https://doi.org/10.1109/TPDS.2014.2320498.CrossRefGoogle Scholar
  11. 11.
    Zhao, Y., Wu, J., & Liu, C. (2014). Dache: A data aware caching for big-data applications using the MapReduce framework. Tsinghua Science and Technology, 19(1), 39–50.  https://doi.org/10.1109/TST.2014.6733207.CrossRefGoogle Scholar
  12. 12.
    Jung, H., & Nakazato, H. (2014). Dynamic scheduling for speculative execution to improve MapReduce performance in heterogeneous environment. In IEEE 34th international conference on distributed computing systems workshops (ICDCSW) (pp. 119–124).Google Scholar
  13. 13.
    Hsiao, J. & Kao, S. (2014). A usage-aware scheduler for improving MapReduce performance in heterogeneous environments. In International conference on information science, electronics and electrical engineering (ISEEE) (pp. 1648–1652).Google Scholar
  14. 14.
    Zhu, N., Liu, X., Liu, J., & Hua, Y. (2014). Towards a cost-efficient MapReduce: Mitigating power peaks for Hadoop clusters. Tsinghua Science and Technology, 19(1), 24–32.  https://doi.org/10.1109/TST.2014.6733205.CrossRefGoogle Scholar
  15. 15.
    Xu, X., Cao, L., & Wang, X. (2014). Adaptive task scheduling strategy based on dynamic workload adjustment for heterogeneous Hadoop clusters. IEEE Systems Journal, 10(2), 471–482.  https://doi.org/10.1109/JSYST.2014.2323112.CrossRefGoogle Scholar
  16. 16.
    Yao, Y., Wang, J., Sheng, B., Lin, J., & Mi, N. (2014). HaSTE: Hadoop YARN scheduling based on task-dependency and resource-demand. In IEEE 7th international conference on cloud computing (CLOUD) (pp. 184–191).Google Scholar
  17. 17.
    Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. H., & Stoica, I. (2008). Improving MapReduce performance in heterogeneous environments. In 8th USENIX symposium on operating systems design and implementation (OSDI) (pp. 29–42).Google Scholar
  18. 18.
    Xiong, R., Luo, J., & Dong, F. (2014). SLDP: A novel data placement strategy for large-scale heterogeneous Hadoop cluster. In Second international conference on advanced cloud and big data (CBD) (pp. 9–17).Google Scholar
  19. 19.
    Guo, Z. & Fox, G. (2012). Improving MapReduce performance in heterogeneous network environments and resource utilization. In 12th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid) (pp. 714–716).Google Scholar
  20. 20.
    Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2015). Task allocation scheme for Hadoop in campus network environment. In IEICE society conference (pp. B-12-20).Google Scholar
  21. 21.
    Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2015). Resource allocation scheme for Hadoop in campus networks. In 21st Asia-Pacific conference on communications (APCC) (APCC 2015) (pp. 596–597).Google Scholar
  22. 22.
    Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2016). Task allocation scheme based on computational and network resources for heterogeneous Hadoop clusters. In IEEE 17th international conference on high performance switching and routing (HPSR) (pp. 200–205).Google Scholar
  23. 23.
    Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., & Stoica, I. (2010). Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In 5th European conference on computer systems (EuroSys ’10) (pp. 265–278).Google Scholar
  24. 24.
    Tan, J., Meng, X., & Zhang, L. (2013). Coupling task progress for mapreduce resource-aware scheduling. In IEEE INFOCOM (pp. 1618–1626).Google Scholar
  25. 25.
    Seo, S., Jang, I., Woo, K., Kim, I., Kim, J. S., & Maeng, S. (2009).HPMR: Prefetching and pre-shuffling in shared mapreduce computation environment. In IEEE international conference on cluster computing and workshops (pp. 1–8).Google Scholar
  26. 26.
    Jin, J., Luo, J., Song, A., Dong, F., & Xiong, R. (2011). Bar: An efficient data locality driven task scheduling algorithm for cloud computing. In 11th IEEE/ACM international symposium on cluster, cloud and grid computing (pp. 295–304).Google Scholar
  27. 27.
    Fischer, M. J., Su, X., & Yin, Y. (2010). Assigning tasks for efficiency in Hadoop: Extended abstract. In Twenty-second annual ACM symposium on parallelism in algorithms and architectures (SPAA ’10) (pp. 30–39).Google Scholar
  28. 28.
    Wang, G., Ng, T. E., & Shaikh, A. (2012). Programming your network at run-time for big data applications. In First workshop on hot topics in software defined networks (HotSDN ’12) (pp. 103–108).Google Scholar
  29. 29.
    Qin, P., Dai, B., Huang, B., & Xu, G. (2017). Bandwidth-aware scheduling with SDN in Hadoop: A new trend for big data. IEEE Systems Journal, 11(4), 2337–2344.  https://doi.org/10.1109/JSYST.2015.2496368.CrossRefGoogle Scholar
  30. 30.
    Zhu, T., Feng, D., Wang, F., Hua, Y., Shi, Q., Liu, J., et al. (2017). Efficient anonymous communication in SDN-based data center networks. IEEE/ACM Transactions on Networking, 25(6), 3767–3780.  https://doi.org/10.1109/TNET.2017.2751616.CrossRefGoogle Scholar
  31. 31.
    Ruffini, M., Slyne, F., Bluemm, C., Kitsuwan, N., & McGettrick, S. (2015). Software defined networking for next generation converged metro-access networks. Optical Fiber Technology, 26(A), 31–41.  https://doi.org/10.1016/j.yofte.2015.08.008.CrossRefGoogle Scholar
  32. 32.
  33. 33.
    Oki, E. (2013). Linear programming and algorithms for communication networks. Boca Raton: CRC Press.Google Scholar
  34. 34.
    When SDN meets Hadoop big data analysis, things get dynamic. Retrieved January 20, 2018 from http://searchsdn.techtarget.com/opinion/When-SDN-meets-Hadoop-big-data-analysis-things-get-dynamic.
  35. 35.
    Kitsuwan, N., McGettrick, S., Slyne, F., Payne, D. B., & Ruffini, M. (2015). Independent transient plane design for protection in OpenFlow-based networks. IEEE/OSA Journal of Optical Communications and Networking, 7(4), 264–275.  https://doi.org/10.1364/JOCN.7.000264.CrossRefGoogle Scholar
  36. 36.
    Zhao, S., & Medhi, D. (2017). Application-aware network design for Hadoop MapReduce optimization using software-defined networking. IEEE Transactions on Network and Service Management, 14(4), 804–816.  https://doi.org/10.1109/TNSM.2017.2728519.CrossRefGoogle Scholar
  37. 37.
    Le Roux, J. L. (2007). Path computation element communication protocol (PCECP) specific requirements for inter-area MPLS and GMPLS traffic engineering. IETF RFC 4927. https://tools.ietf.org/html/rfc4927.
  38. 38.
    Lee, Y., Le Roux, J. L., King, D., & Oki, E. (2009). Path computation element communication protocol (PCEP) Requirements and Protocol Extensions in Support of Global Concurrent Optimization. IETF RFC 5557. https://tools.ietf.org/html/rfc5557.
  39. 39.
    Oki, E., Inoue, I., & Shiomoto, K. (2007). Path computation element (PCE)-based traffic engineering in MPLS and GMPLS networks. In IEEE sarnoff symposium (pp. 1–5).Google Scholar
  40. 40.
    Oki, E., Takada. T., Le Roux, J. L., & Farrel, A. (2009). Framework for PCE-based inter-layer MPLS and GMPLS Traffic Engineering. IETF RFC 5623. https://tools.ietf.org/html/rfc5623.
  41. 41.
    Apache Hadoop source code. Retrieved November 29, 2016 from http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1-src.tar.gz/.
  42. 42.
    VMware solution. Retrieved January 24, 2016 from http://www.vsolution.jp/.
  43. 43.
    Ishii, M., Han, J., & Makino, H. (2013). Design and Performance Evaluation for Hadoop Clusters on Virtualized Environment. In International Conference on Information Networking (ICOIN) (pp. 244-249).Google Scholar
  44. 44.
    Pi program. Retrieved January 24, 2016 from http://h2np.net/pi/mt-bbp.c.
  45. 45.
    Machin-Like Formulas. Retrieved November 29, 2016 from http://mathworld.wolfram.com/ Machin-LikeFormulas.html.
  46. 46.
  47. 47.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.The Department of Computer and Network EngineeringThe University of Electro-CommunicationsTokyoJapan
  2. 2.Indraprastha Institute of Information TechnologyDelhiIndia
  3. 3.Graduate School of InformaticsKyoto UniversityKyotoJapan
  4. 4.The Department of Electrical and Computer EngineeringUniversity of VirginiaCharlottesvilleUSA
  5. 5.The Department of Information and Computer ScienceKeio UniversityTokyoJapan

Personalised recommendations