A Fuzzy Neural Network Based Dynamic Data Allocation Model on Heterogeneous Multi-GPUs for Large-scale Computations

  • Chao-Long Zhang
  • Yuan-Ping XuEmail author
  • Zhi-Jie Xu
  • Jia He
  • Jing Wang
  • Jian-Hua Adu
Research Article


The parallel computation capabilities of modern graphics processing units (GPUs) have attracted increasing attention from researchers and engineers who have been conducting high computational throughput studies. However, current single GPU based engineering solutions are often struggling to fulfill their real-time requirements. Thus, the multi-GPU-based approach has become a popular and cost-effective choice for tackling the demands. In those cases, the computational load balancing over multiple GPU “nodes” is often the key and bottleneck that affect the quality and performance of the real-time system. The existing load balancing approaches are mainly based on the assumption that all GPU nodes in the same computer framework are of equal computational performance, which is often not the case due to cluster design and other legacy issues. This paper presents a novel dynamic load balancing (DLB) model for rapid data division and allocation on heterogeneous GPU nodes based on an innovative fuzzy neural network (FNN). In this research, a 5-state parameter feedback mechanism defining the overall cluster and node performance is proposed. The corresponding FNN-based DLB model will be capable of monitoring and predicting individual node performance under different workload scenarios. A real-time adaptive scheduler has been devised to reorganize the data inputs to each node when necessary to maintain their runtime computational performance. The devised model has been implemented on two dimensional (2D) discrete wavelet transform (DWT) applications for evaluation. Experiment results show that this DLB model enables a high computational throughput while ensuring real-time and precision requirements from complex computational tasks.


Heterogeneous GPU cluster dynamic load balancing fuzzy neural network adaptive scheduler discrete wavelet transform 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    D. B. Kirk, W. W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach, 3rd ed, New York, USA: Morgan Kaufmann, 2016.Google Scholar
  2. [2]
    R. Couturier. Designing Scientific Applications on GPUs, Boca Raton, USA: CRC Press, 2013.zbMATHGoogle Scholar
  3. [3]
    S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, D. Glasco. GPUs and the future of parallel computing. IEEE Micro, vol. 31, no. 5, pp. 7–17, 2011. DOI: 10.1109/MM.2011.89.CrossRefGoogle Scholar
  4. [4]
    C. W. Lee, J. Ko, T. Y. Choe. Two-way partitioning of a recursive Gaussian filter in CUDA. EURASIP Journal on Image and Video Processing, vol. 2014, no. 1, Article number 33, 2014. DOI: 10.1186/1687-5281-2014-33.Google Scholar
  5. [5]
    J. A. Belloch, A. Gonzalez, F. J. Martínez-Zaldívar, A. M. Vidal. Real-time massive convolution for audio applications on GPU. The Journal of Supercomputing, vol. 58, no. 3, pp. 449–457, 2011. DOI: 10.1007/s11227-011-0610.CrossRefGoogle Scholar
  6. [6]
    F. Nasse, C. Thurau, G. A. Fink. Face detection using GPU-based convolutional neural networks. In Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns, Münster, Germany, pp. 83–90, 2009. DOI: 10.1007/978-3-642-03767-2 10.CrossRefGoogle Scholar
  7. [7]
    NVIDIA. CUDA C Programming Guide v8.0. [Online], Available: cuda/cuda-cprogramming- guide/index.htm, 2017.Google Scholar
  8. [8]
    A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017. DOI: 10.1145/3065386.CrossRefGoogle Scholar
  9. [9]
    C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, 2015. DOI: 10.1109/CVPR.2015.7298594.Google Scholar
  10. [10]
    E. Guerra, J. De Lara, A. Malizia, P. Díaz. Supporting user-oriented analysis for multi-view domain-specific visual languages. Information and Software Technology, vol. 51, no. 4, pp. 769–784, 2009. DOI: 10.1016/j.infsof.2008.09.005.CrossRefGoogle Scholar
  11. [11]
    X. J. Jiang, D. J. Whitehouse. Technological shifts in surface metrology. CIRP Annals, vol. 61, no. 2, pp. 815–836, 2012. DOI: 10.1016/j.cirp.2012.05.009.CrossRefGoogle Scholar
  12. [12]
    J. J. Wang, W. L. Lu, X. J. Liu, X. Q. Jiang. Highspeed parallel wavelet algorithm based on CUDA and its application in three-dimensional surface texture analysis. In Proceedings of International Conference on Electric Information and Control Engineering, IEEE, Wuhan, China, pp. 2249–2252, 2011. DOI: 10.1109/ICEICE.2011.5778225.Google Scholar
  13. [13]
    S. Chen, X. M. Li. A hybrid GPU/CPU FFT library for large FFT problems. In Proceedings of the 32nd International Performance Computing and Communications Conference, IEEE, San Diego, USA, 2013. DOI: 10.1109/PCCC.2013.6742796.Google Scholar
  14. [14]
    C. L. Zhang, Y. P. Xu, J. He, J. Lu, L. Lu, Z. J. Xu. Multi-GPUs Gaussian filtering for real-time big data processing. In Proceedings of the 10th International Conference on Software, Knowledge, Information Management & Applications, IEEE, Chengdu, China, 2016. DOI: 10.1109/SKIMA.2016.7916225.Google Scholar
  15. [15]
    S. Schaetz, M. Uecker. A multi-GPU programming library for real-time applications. In Proceedings of the 12th International Conference on Algorithms and Architectures for Parallel Processing, Fukuoka, Japan, pp. 231–236, 2012. DOI: 10.1007/978-3-642-33078-0 9.Google Scholar
  16. [16]
    J. A. Stuart, J. D. Owens. Multi-GPU MapReduce on GPU clusters. In Proceedings of 2011 IEEE International Parallel & Distributed Processing Symposium, IEEE, Anchorage, USA, pp. 1068–1079, 2011. DOI: 10.1109/IPDPS.2011.102.CrossRefGoogle Scholar
  17. [17]
    M. Grossman, M. Breternitz, V. Sarkar. HadoopCL: MapReduce on distributed heterogeneous platforms through seamless integration of Hadoop and OpenCL. In Proceedings of the 27th Parallel and Distributed Processing Symposium Workshops & PhD Forum, IEEE, Cambridge, MA, USA, pp. 1918–1927, 2013. DOI: 10.1109/IPDPSW.2013.246.Google Scholar
  18. [18]
    M. Boyer, K. Skadron, S. Che, N. Jayasena. Load balancing in a changing world: Dealing with heterogeneity and performance variability. In Proceedings of ACM International Conference on Computing Frontiers, Ischia, Italy, 2013. DOI: 10.1145/2482767.2482794.Google Scholar
  19. [19]
    L. Chen, O. Villa, S. Krishnamoorthy, G. R. Gao. Dynamic load balancing on single- and multi-GPU systems. In Proceedings of IEEE International Symposium on Parallel & Distributed Processing, IEEE, Atlanta, USA, 2010. DOI: 10.1109/IPDPS.2010.5470413.Google Scholar
  20. [20]
    A. Acosta, R. Corujo, V. Blanco, F. Almeida. Dynamic load balancing on heterogeneous multicore/multiGPU systems. In Proceedings of International Conference on High Performance Computing and Simulation, IEEE, Caen, France, pp. 467–476, 2010. DOI: 10.1109/HPCS.2010.5547097.Google Scholar
  21. [21]
    A. Acosta, V. Blanco, F. Almeida. Towards the dynamic load balancing on heterogeneous multi-GPU systems. In Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE, Leganes, Spain, pp. 646–653, 2012. DOI: 10.1109/ISPA.2012.96.Google Scholar
  22. [22]
    B. Pérez, E. Stafford, J. L. Bosque, R. Beivide. Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. The Journal of Supercomputing, vol. 73, no. 1, pp. 330–342, 2017. DOI: 10.1007/s11227-016- 1864-y.CrossRefGoogle Scholar
  23. [23]
    R. Kaleem, R. Barik, T. Shpeisman, C. L. Hu, B. T. Lewis, K. Pingali. Adaptive heterogeneous scheduling for integrated GPUs. In Proceedings of the 23rd International Conference on Parallel Architecture and Compilation Techniques, IEEE, Edmonton, Canada, pp. 151–162, 2014. DOI: 10.1145/2628071.2628088.Google Scholar
  24. [24]
    C. L. Zhang, Y. P. Xu, J. L. Zhou, Z. J. Xu, L. Lu, J. Lu. Dynamic load balancing on multi-GPUs system for big data processing. In Proceedings of the 23rd International Conference on Automation and Computing, IEEE, Huddersfield, UK, 2017. DOI: 10.23919/IConAC.2017.8082085.Google Scholar
  25. [25]
    K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: 10.1109/CVPR.2016.90.Google Scholar
  26. [26]
    H. Zermane, H. Mouss. Development of an internet and fuzzy based control system of manufacturing process. International Journal of Automation and Computing, vol. 14, no. 6, pp. 706–718, 2017. DOI: 10.1007/s11633-016-1027-x.CrossRefGoogle Scholar
  27. [27]
    J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, W. J. Lou. Fuzzy keyword search over encrypted data in cloud computing. In Proceedings of IEEE Conference on Computer Communications, IEEE, San Diego, CA, USA, pp. 1–5, 2010. DOI: 10.1109/INFCOM.2010.5462196.Google Scholar
  28. [28]
    S. Krinidis, V. Chatzis. A robust fuzzy local information C-means clustering algorithm. IEEE Transactions on Image Processing, vol. 19, no. 5, pp. 1328–1337, 2010. DOI: 10.1109/TIP.2010.2040763.MathSciNetCrossRefzbMATHGoogle Scholar
  29. [29]
    M. Algabri, H. Mathkour, H. Ramdane. Mobile robot navigation and obstacle-avoidance using ANFIS in unknown environment. International Journal of Computer Applications, vol. 91, no. 14, pp. 36–41, 2014. DOI: 10.5120/15952- 5400.CrossRefGoogle Scholar
  30. [30]
    R. J. Kuo, S. Y. Hong, Y. C. Huang. Integration of particle swarm optimization-based fuzzy neural network and artificial neural network for supplier selection. Applied Mathematical Modelling, vol. 34, no. 12, pp. 3976–3990, 2010. DOI: 10.1016/j.apm.2010.03.033.CrossRefzbMATHGoogle Scholar
  31. [31]
    C. L. P. Chen, Y. J. Liu, G. X. Wen. Fuzzy neural network-based adaptive control for a class of uncertain nonlinear stochastic systems. IEEE Transactions on Cybernetics, vol. 44, no. 5, pp. 583–593, 2014. DOI: 10.1109/TCYB. 2013.2262935.CrossRefGoogle Scholar
  32. [32]
    A. Saffar, R. Hooshmand, A. Khodabakhshian. A new fuzzy optimal reconfiguration of distribution systems for loss reduction and load balancing using ant colony search-based algorithm. Applied Soft Computing, vol. 11, no. 5, pp. 4021–4028, 2011. DOI: 10.1016/j.asoc.2011.03.003.CrossRefGoogle Scholar
  33. [33]
    N. Susila, S. Chandramathi, R. Kishore. A fuzzy-based firefly algorithm for dynamic load balancing in cloud computing environment. Journal of Emerging Technologies in Web Intelligence, vol. 6, no. 4, pp. 435–440, 2014. DOI:10.4304/jetwi.6.4.435-440Google Scholar
  34. [34]
    A. N. Toosi, R. Buyya. A fuzzy logic-based controller for cost and energy efficient load balancing in geo-distributed data centers. In Proceedings of the 8th IEEE/ACM International Conference on Utility and Cloud Computing, IEEE, Limassol, Cyprus, pp. 186–194, 2015. DOI: 10.1109/UCC.2015.35.Google Scholar
  35. [35]
    H. Muhamedsalih, X. Jiang, F. Gao. Accelerated surface measurement using wavelength scanning interferometer with compensation of environmental noise. Procedia CIRP, vol. 10, pp. 70–76, 2013. DOI: 10.1016/j.procir.2013.08.014.CrossRefGoogle Scholar
  36. [36]
    S. H. Lee, J. S. Lim. Forecasting KOSPI based on a neural network with weighted fuzzy membership functions. Expert Systems with Applications, vol. 38, no. 4, pp. 4259–4263, 2011. DOI: 10.1016/j.eswa.2010.09.093.MathSciNetCrossRefGoogle Scholar
  37. [37]
    W. Sweldens. The lifting scheme: A construction of second generation wavelets. SIAM Journal on Mathematical Analysis, vol. 29, no. 2, pp. 511–546, 1998. DOI: 10.1137/S0036141095289051.MathSciNetCrossRefzbMATHGoogle Scholar
  38. [38]
    S. Mittal, J. S. Vetter. A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys, vol. 47, no. 4, Article number 69, 2015. DOI: 10.1145/2788396.Google Scholar

Copyright information

© Institute of Automation, Chinese Academy of Sciences and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Software EngineeringChengdu University of Information TechnologyChengduChina
  2. 2.School of Computer ScienceChengdu University of Information TechnologyChengduChina
  3. 3.School of Computing & EngineeringUniversity of HuddersfieldQueensgate, HuddersfieldUK
  4. 4.Department of ComputingSheffield Hallam UniversitySheffieldUK

Personalised recommendations