Skip to main content
Log in

OFScheduler: A Dynamic Network Optimizer for MapReduce in Heterogeneous Cluster

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

MapReduce is a popular programming paradigm in cloud computing due to its excellent scalability for processing large-scale data. However, MapReduce performs poorly in heterogeneous clusters. One of the reasons is that Hadoop’s built-in load balancing algorithm for Map function leads to excessive network traffic. We propose a new dynamic network optimizer called OFScheduler for heterogeneous clusters to relieve the network traffic during the execution of MapReduce jobs. The optimizer focuses on reducing bandwith competition, balancing the workload of network links and increasing bandwidth utilization. The proposed optimizer tags different types of traffic and utilize the Openflow to adjust transfers of flows dynamically. We instantiate a simulator and an OpenFlow testbed for evaluation. The simulation results demonstrate that the proposed optimizer has a significant effect on increasing bandwidth utilization and improving the performance of MapReduce by 24 ~ 63 % for most of jobs in a multi-path heterogeneous cluster. The experiment results show that the proposed optimizer can be deployed into a real environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. https://wiki.apache.org/hadoop/hadoopmapreduce

  2. Ahmad, F., Chakradhar, S., Raghunathan, A., Vijaykumar, T.: Tarazu: optimizing mapreduce on heterogeneous clusters. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 61–74 (2012)

  3. Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.: Puma: Purdue mapreduce benchmarks suite. http://web.ics.purdue.edu/fahmad/benchmarks.htm (2012)

  4. Al-Fares, M., Radhakrishnan, S., Raghavan, B., Huang, N., Vahdat, A.: Hedera: dynamic flow scheduling for data center networks. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, pp. 19–19 (2010)

  5. Ballani, H., Costa, P., Karagiannis, T., Rowstron, A.: Towards predictable datacenter networks. In: SIGCOMM-Computer Communication Review (2011)

  6. Chaiken, R., Jenkins, B., Larson, P.Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. 1(2), 1265–1276 (2008)

    Article  Google Scholar 

  7. Chowdhury, M., Zaharia, M., Ma, J., Jordan, M., Stoica, I.: Managing data transfers in computer clusters with orchestra. SIGCOMM-Comput. Commun. Rev. 41(4), 98 (2011)

    Article  Google Scholar 

  8. Curtis, A., Kim, W., Yalagandula, P.: Mahout: low-overhead datacenter traffic management using end-host-based elephant detection. In: INFOCOM, 2011 Proceedings IEEE, pp. 1629–1637. IEEE (2011)

  9. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  10. Dijkstra, E.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959)

    Article  MATH  MathSciNet  Google Scholar 

  11. Gude, N., Koponen, T., Pettit, J., Pfaff, B., Casado, M., McKeown, N., Shenker, S.: Nox: towards an operating system for networks. ACM SIGCOMM Comput. Commun. Rev. 38(3), 105–110 (2008)

    Article  Google Scholar 

  12. Handigol, N., Seetharaman, S., Flajslik, M., McKeown, N., Johari, R.: Plug-n-serve: load-balancing web traffic using openflow. In: ACM SIGCOMM Demo (2009)

  13. Luo, T., Tan, H.P., Quan, P.C., Law, Y.W., Jin, J.: Enhancing responsiveness and scalability for openflow networks via control-message quenching. In: ICT Convergence (ICTC), 2012 International Conference on, pp. 348–353. IEEE (2012)

  14. McKeown, N.: Openflow specification v1.0.0 (2008)

  15. McKeown, N., Anderson, T., Balakrishnan, H., Parulkar, G., Peterson, L., Rexford, J., Shenker, S., Turner, J.: Openflow: enabling innovation in campus networks. ACM SIGCOMM Comput. Commun. Rev. 38(2), 69–74 (2008)

    Article  Google Scholar 

  16. MM, O., Okamura, K.: Design and implementation of application based routing using openflow. CFI (2010)

  17. Pfaff, B., Pettit, J., Koponen, T., Amidon, K., Casado, M., Shenker, S.: Extending networking into the virtualization layer. In: Proceedings of the HotNets, (Oct 2009) (2009)

  18. Shieh, A., Kandula, S., Greenberg, A., Kim, C., Saha, B.: Sharing the data center network. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, pp. 23–23. USENIX Association (2011)

  19. Tian, C., Zhou, H., He, Y., Zha, L.: A dynamic mapreduce scheduler for heterogeneous workloads. In: Eighth International Conference on Grid and Cooperative Computing, 2009. GCC’09, pp. 218–224. IEEE (2009)

  20. Tootoonchian, A., Ganjali, Y.: Hyperflow: A distributed control plane for openflow. In: Proceedings of the 2010 Internet Network Management Conference on Research on Enterprise Networking, pp. 3–3. USENIX Association (2010)

  21. Vahdat, A., Al-Fares, M., Farrington, N., Mysore, R., Porter, G., Radhakrishnan, S.: Scale-out networking in the data center. Micro, IEEE 30(4), 29–41 (2010)

    Article  Google Scholar 

  22. White, T.: Hadoop: the definitive guide. O’Reilly, Media (2012)

  23. Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In: 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–9 (2010)

  24. Zaharia, M., Konwinski, A., Joseph, A., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pp. 29–42 (2008)

  25. Zhang, B., Qiu, J.: Accelerating data transfers in iterative MapReduce framework. Indiana University, USA (2012)

Download references

Acknowledgments

This work was supported in part by the 863 Program of China (No. 2011AA01A202), the Doctoral Fund of Ministry of Education of China (No. 20100073120022), Natural Science Foundation of China (No. 61202025) and the STCSM (Grant No. 12ZR1414900). Yao Shen is the corresponding author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yao Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Shen, Y., Yao, B. et al. OFScheduler: A Dynamic Network Optimizer for MapReduce in Heterogeneous Cluster. Int J Parallel Prog 43, 472–488 (2015). https://doi.org/10.1007/s10766-013-0281-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-013-0281-6

Keywords

Navigation