Cluster Computing

, Volume 22, Supplement 3, pp 5975–5985 | Cite as

The bandwidth-aware backup task scheduling strategy using SDN in Hadoop

  • Fengjun ShangEmail author
  • Xuanling Chen
  • Chenyun Yan
  • Luzhong Li
  • Yuting Zhao


In the era of big data, the traditional capacity of computing and storage has been unable to meet the growing demand. In this case, Cloud Computing technology is emerging. Researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system. In this paper, a speculative task scheduling strategy that based on SDN technology is improved. For LATE mechanism, some slow tasks are slower than speculative tasks. This is not only unable to reduce task turnaround time and a waste of system resources. In this paper, we join the slow task compared with the speculative task for the speculative task scheduling strategy of LATE. Wherein, the run time of speculative tasks contains the input data transfer time, real-time bandwidth corresponding to a bandwidth of the link. Based on this model, we propose a bandwidth-aware speculative task run time estimation model (BWRE) based on SDN, using this model to accurately speculative the backup task run time. And we use SDN to provide bandwidth guarantees for the speculative task. Finally, BWRE is verified by simulation experiments. Evaluation results show that BWRE outperforms the shortening job turnaround time by an average of 9.85%.


Hadoop Task scheduling SDN LATE MapReduce 



The work has been supported by the National Nature Science Foundation of China (No. 61672004) and the Chongqing Research Program of Basic Research and Frontier Technology under Grant No. cstc2016jcyjA0590.


  1. 1.
    Landset, S., Khoshgoftaar, T.M., Richter, A.N., et al.: A survey of open source tools for machine learning with big data in the Hadoop ecosystem. J. Big Data 2(1), 1–36 (2015)Google Scholar
  2. 2.
    Saxena, V.K., Pushkar, S.: Cloud computing challenges and implementations. In: International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 2583–2588 (2016)Google Scholar
  3. 3.
    Shahid, A., Fiaidhi, J., Mohammed, S.: Implementing innovative routing using software defined networking (SDN). Int. J. Multimed. Ubiquitous Eng. 11(2), 159–172 (2016)Google Scholar
  4. 4.
    Mashayekhy, L., Nejad, M.M., Grosu, D., et al.: Energy-aware scheduling of MapReduce jobs for big data applications. Parallel Distrib. Syst. IEEE Trans. 26(10), 2720–2733 (2015)Google Scholar
  5. 5.
    Yu, S.: Big privacy: challenges and opportunities of privacy study in the age of big data. IEEE Access 4, 2751–2763 (2017)Google Scholar
  6. 6.
    Zhang, Q., Zhani, M.F., Yang, Y., et al.: PRISM: fine-grained resource-aware scheduling for MapReduce. IEEE Trans. Cloud Comput. 3(2), 182–194 (2015)Google Scholar
  7. 7.
    Huang, W., Meng, L., Zhang, D., et al.: In-memory parallel processing of massive remotely sensed data using an apache spark on Hadoop YARN model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 10(1), 3–19 (2017)Google Scholar
  8. 8.
    Wu, L., Yuan, L., You, J.: Survey of large-scale data management systems for big data applications. J. Comput. Sci. Technol. 30(1), 163–183 (2015)Google Scholar
  9. 9.
    Sun, D., Zhang, G., Yang, S., et al.: Re-stream: real-time and energy-efficient resource scheduling in big data stream computing environments. Inf. Sci. 319, 92–112 (2015)MathSciNetGoogle Scholar
  10. 10.
    Douglas, C., Curino, C.: Blind men and an elephant coalescing open-source, academic, and industrial perspectives on Big Data. In: IEEE International Conference on Data Engineering. IEEE, pp. 1523–1526 (2015)Google Scholar
  11. 11.
    Finocchi, I., Finocchi, M., Fusco, E.G.: Clique counting in MapReduce: algorithms and experiments. J. Exp. Algorithmics (JEA) 20(1), 1–7 (2015)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., et al.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1 (2015)Google Scholar
  13. 13.
    Elmeleegy, K., Reed, B., Reed, B.: SpongeFiles: mitigating data skew in mapreduce using distributed memory. In: ACM SIGMOD International Conference on Management of Data. ACM, pp. 551–562 (2014)Google Scholar
  14. 14.
    Yang, L., Jie, Y., Yuan, H., et al.: MapReduce based parallel neural networks in enabling large scale machine learning. Computat. Intell. Neurosci. 2015(2), 297672 (2015)Google Scholar
  15. 15.
    Kumar, A., Shankar, R., Choudhary, A., et al.: A big data MapReduce framework for fault diagnosis in cloud-based manufacturing. Int. J. Prod. Res. 54(23), 7060–7073 (2016)Google Scholar
  16. 16.
    Qian, J., Lv, P., Yue, X., et al.: Hierarchical attribute reduction algorithms for big data using MapReduce. Knowl.-Based Syst. 73(1), 18–31 (2015)Google Scholar
  17. 17.
    Li, Z., Shen, Y., Yao, B., et al.: OFScheduler: a dynamic network optimizer for mapreduce in heterogeneous cluster. Int. J. Parallel Program. 43(3), 472–488 (2015)Google Scholar
  18. 18.
    Gagie, T., Gawrychowski, P., Puglisi, S.J.: Approximate pattern matching in LZ77-compressed texts. J. Discret. Algorithms 32(C), 64–68 (2015)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Hashem, I.A.T., Anuar, N.B., Gani, A., et al.: MapReduce: review and open challenges. Scientometrics 109(1), 389–422 (2016)Google Scholar
  20. 20.
    Magalhães, D., Calheiros, R.N., Buyya, R., et al.: Workload modeling for resource usage analysis and simulation in cloud computing. Comput. Electr. Eng. 47(17), 69–81 (2015)Google Scholar
  21. 21.
    Min, F., Xu, J.: Semi-greedy heuristics for feature selection with test cost constraints. Granul. Comput. 1(3), 199–211 (2016)Google Scholar
  22. 22.
    Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: a high performance graph processing library on the gpu. In In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM 30, 265–266 (2015)Google Scholar
  23. 23.
    Won, H., Nguyen, M.C., Gil, M.S., et al.: Advanced resource management with access control for multitenant Hadoop. J. Commun. Netw. 17(6), 592–601 (2016)Google Scholar
  24. 24.
    White, T.: Hadoop: The definitive guide, pp. 125–230. O’Reilly Media, Inc., California (2015)Google Scholar
  25. 25.
    Li, H., Li, P., Guo, S., et al.: Byzantine-resilient secure software-defined networks with multiple controllers in cloud. IEEE Trans. Cloud Comput. 2(4), 436–447 (2015)Google Scholar
  26. 26.
    Xiaotao, W U., Dongyan, J I., Chen, A.: Opportunities and challenges of the reform of China’s Emergency Management System in the age of big data. J. Henan Polytechnic Univ. (2016)Google Scholar
  27. 27.
    Liu, X., Zhao, D., Xu, L., et al.: A distributed video management cloud platform Using Hadoop. IEEE Access 3, 2637–2643 (2017)Google Scholar
  28. 28.
    Indiveri, G., Liu, S.C.: Memory and information processing in neuromorphic systems. Proc. IEEE 103(8), 1379–1397 (2015)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Fengjun Shang
    • 1
    Email author
  • Xuanling Chen
    • 1
  • Chenyun Yan
    • 1
  • Luzhong Li
    • 1
  • Yuting Zhao
    • 1
  1. 1.College of Computer Science and TechnologyChongqing University of Posts and TelecommunicationsChongqingChina

Personalised recommendations