Data-Centric Task Scheduling Algorithm for Hybrid Tasks in Cloud Data Centers

  • Xin LiEmail author
  • Liangyuan Wang
  • Jemal Abawajy
  • Xiaolin Qin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11335)


With the development of big data, a demand for data analysis keeps increasing. This requirement has prompted a need for data-aware task scheduling approach that can simultaneously schedule various tasks such as batched tasks and real-time tasks in a data center efficiently. To this end, we propose a hybrid task scheduling strategy coupled with data migration in data center. Firstly, we translate the task scheduling problem into task selection problem, and give methods of selecting batched tasks and real-time tasks respectively. Then the method for scheduling both batched tasks and real-time tasks is introduced in detail. Finally, we integrate data migration into the hybrid scheduling strategy. Experimental results show that, compared to the traditional FIFO algorithm, the proposed task scheduling strategy greatly improves the data locality and data migration performs very well on reducing the job execution time. Our algorithm also guarantees an acceptable fairness for tasks.


Data analysis Data migration Batched task Real-time task Hybrid scheduling 



This work is supported in part by the National Natural Science Foundation of China under Grant 61373015, in part by the Jiangsu Natural Science Foundation under Grant BK20160813 and BK20140832, in part by the National Key R&D Program of China under Grant 2018YFB1003902, in part by the Open Project Funded by State Key Laboratory of Computer Architecture under Grant CARCH201710, and in part by the Project Funded by China Postdoctoral Science Foundation.


  1. 1.
  2. 2.
  3. 3.
    Chen, Q., Zhang, D., Guo, M., Deng, Q., Guo, S.: SAMR: a self-adaptive mapreduce scheduling algorithm in heterogeneous environment. In: IEEE International Conference on Computer and Information Technology, pp. 2736–2743, June 2010Google Scholar
  4. 4.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of USENIX OSDI, pp. 1–45 (2013)Google Scholar
  5. 5.
    Lee, Y.C., Zomaya, A.Y.: Energy conscious scheduling for distributed computing systems under different operating conditions. IEEE Trans. Parallel Distrib. Syst. 22(8), 1374–1381 (2011)CrossRefGoogle Scholar
  6. 6.
    Li, D., Wu, J., Chang, W.: Efficient cloudlet deployment: local cooperation and regional proxy. In: International Conference on Computing, Networking and Communications, pp. 757–761, March 2018Google Scholar
  7. 7.
    Li, X., Tatebe, O.: Data-aware task dispatching for batch queuing system. IEEE Syst. J. 11(2), 889–897 (2017)CrossRefGoogle Scholar
  8. 8.
    Li, X., Wang, L., Lian, Z., Qin, X.: Migration-based online CPSCN big data analysis in data centers. IEEE Access 6, 19270–19277 (2018)CrossRefGoogle Scholar
  9. 9.
    Li, X., Wu, J., Qian, Z., Tang, S., Lu, S.: Towards location-aware joint job and data assignment in cloud data centers with NVM. In: Proceedings of IEEE IPCCC, pp. 1–8, December 2017Google Scholar
  10. 10.
    Shi, W., Gao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: vision and challenges. IEEE Internet Things J. 3(5), 637–646 (2016)CrossRefGoogle Scholar
  11. 11.
    Thomas, L., R, S.: Survey on mapreduce scheduling algorithms. Int. J. Comput. Appl. 95(23), 9–13 (2014)Google Scholar
  12. 12.
    Vavilapalli, V.K., et al.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, no. 5, October 2013Google Scholar
  13. 13.
    Wang, W., Zhu, K., Ying, L., Tan, J., Zhang, L.: Map task scheduling in mapreduce with data locality: throughput and heavy-traffic optimality. IEEE/ACM Trans. Netw. 24(1), 190–203 (2016)CrossRefGoogle Scholar
  14. 14.
    Yu, B., Pan, J.: Location-aware associated data placement for geo-distributed data-intensive applications. In: IEEE Conference on Computing Communications, pp. 603–611, April 2015Google Scholar
  15. 15.
    Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278. ACM (2010)Google Scholar
  16. 16.
    Zhou, Z., et al.: Minimizing SLA violation and power consumption in cloud data centers using adaptive energy-aware algorithms. Future Gen. Comput. Syst. 86, 836–850 (2018)CrossRefGoogle Scholar
  17. 17.
    Zhu, C., Zhou, H., Leung, V.C.M., Wang, K., Zhang, Y., Yang, L.T.: Toward big data in green city. IEEE Commun. Mag. 55(11), 14–18 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xin Li
    • 1
    • 2
    • 3
    Email author
  • Liangyuan Wang
    • 1
  • Jemal Abawajy
    • 4
  • Xiaolin Qin
    • 1
  1. 1.College of Computer Science and TechnologyNanjing University of Aeronautics and AstronauticsNanjingChina
  2. 2.State Key Laboratory of Computer Architecture, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  3. 3.Collaborative Innovation Center of Novel Software Technology and IndustrializationNanjingChina
  4. 4.School of Information TechnologyDeakin UniversityMelbourneAustralia

Personalised recommendations