A hybrid GPU-FPGA based design methodology for enhancing machine learning applications performance

  • Xu Liu
  • Hibat-Allah Ounifi
  • Abdelouahed GherbiEmail author
  • Wubin Li
  • Mohamed Cheriet
Original Research


The high-density computing requirements of machine learning (ML) is a challenging performance bottleneck. Limited by the sequential instruction execution system, traditional general purpose processors are not suitable for efficient ML. In this work, we present an ML system design methodology based on GPU and FPGA to tackle this problem. The core idea of our proposal is when designing an ML platform, we leverage the graphics processing unit (GPU)’s high-density computing to perform model training and exploit field programmable gate array (FPGA)’s low-latency to perform model inferencing. In between, we define a model converter, which enable transforming the model used by the training module to one that is used by inferencing module. We evaluated our approach through two use cases. The first is a handwritten digit recognition with convolutional neural network while the second use case is for predicting data center’s power usage effectiveness with deep neural network regression algorithm. The experimental results indicate that our solution can take advantages of GPU and FPGA’s parallel computing capacity to improve the efficiency of training and inferencing significantly. Meanwhile, the solution preserves the accuracy and the mean square error while converting the models between the different frameworks.


Machine learning High performance computing Heterogeneous computing Hybrid platform GPU computing FPGA computing CNN DNN Model converting PUE 



This work is partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), Ericsson Research Canada and the Canada Research Chair in Sustainable Smart Eco-Cloud. We would also like to thank Yves Lemieux for his insightful feedback during the research work.


  1. Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCL deep learning accelerator on Arria 10. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, ACM, pp 55–64Google Scholar
  2. Bauer S, Köhler S, Doll K, Brunsmann U (2010) FPGA-GPU architecture for Kernel SVM pedestrian detection. In: Proceedings of the 2010 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), IEEE, pp 61–68Google Scholar
  3. Bergstra J, Bastien F, Breuleux O, Lamblin P, Pascanu R, Delalleau O et al (2011) Theano: deep learning on gpus with python. In: NIPS 2011, BigLearning Workshop, Granada, Spain, Citeseer, vol 3Google Scholar
  4. Bettoni M, Urgese G, Kobayashi Y, Macii E, Acquaviva A (2017) A convolutional neural network fully implemented on FPGA for embedded platforms. In: New generation of CAS (NGCAS), IEEE, pp 49–52Google Scholar
  5. Chen C, Yao J, Zhang R, Zhou Y, Qin T, Zhan T, Wang Q (2019) MMdnn. GitHub repository.
  6. David Wright (2017) Improving electrical efficiency in your data center.
  7. Ganesh SS, Arulmozhivarman P, Tatavarti VSNR (2018) Prediction of pm2.5 using an ensemble of artificial neural networks and regression models. J Ambient Intell Hum Comput. Google Scholar
  8. Google (2019) TensorFlow.
  9. Huang R, Feng W, Fan M, Guo Q, Sun J (2017) Learning multi-path cnn for mural deterioration detection. J Ambient Intell Hum Comput. Google Scholar
  10. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  11. LeCun Y, Cortes C, Burges CJC (2018) The MNIST database.
  12. Li Y, Liu Z, Xu K, Yu H, Ren F (2018) A gpu-outperforming fpga accelerator architecture for binary convolutional neural networks. J Emerg Technol Comput Syst 14(2):18:1–18:16. CrossRefGoogle Scholar
  13. Liu X, Ounifi HA, Gherbi A, Lemieux Y, Li W (2018) A hybrid gpu-FPGA-based computing platform for machine learning. Proc Comput Sci 141:104–111CrossRefGoogle Scholar
  14. Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: Proceedings of the 21st Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE, pp 575–580Google Scholar
  15. Nagarajan K, Holland B, George AD, Slatton KC, Lam H (2011) Accelerating machine-learning algorithms on FPGAs using pattern-based decomposition. J Signal Process Syst 62(1):43–63CrossRefGoogle Scholar
  16. Ounifi HA, Liu X, Gherbi A, Lemieux Y, Li W (2018) Model-based approach to data center design and power usage effectiveness assessment. Proc Comput Sci 141:143–150CrossRefGoogle Scholar
  17. Potluri S, Fasih A, Vutukuru LK, Al Machot F, Kyamakya K (2011) CNN based high performance computing for real time image processing on GPU. In: 2011 joint 3rd Int’l workshop on nonlinear dynamics and synchronization (INDS) and 16th Int’l symposium on theoretical electrical engineering (ISTET), IEEE, pp 1–7Google Scholar
  18. Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays, ACM, pp 26–35Google Scholar
  19. Raina R, Madhavan A, Ng AY (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 873–880Google Scholar
  20. Rush A, Sirasao A, Ignatowski M (2017) Unified deep learning with cpu gpu and fpga technologies. In: Advanced Micro Devices, Tech. RepGoogle Scholar
  21. Sharp T (2008) Implementing decision trees and forests on a GPU. In: European conference on computer vision. Springer, Berlin, Heidelberg, pp 595–608Google Scholar
  22. Steinkraus D, Buck I, Y Simard P (2005) Using GPUs for machine learning algorithms. In: Proceedings of the 8th international conference on document analysis and recognition, IEEE, pp 1115–1120Google Scholar
  23. Wang C, Gong L, Yu Q, Li X, Xie Y, Zhou X (2017) Dlau: a scalable deep learning accelerator unit on FPGA. IEEE Trans Comput Aided Design Integr Circ Syst 36(3):513–517Google Scholar
  24. Zhao W, Fu H, Luk W, Yu T, Wang S, Feng B, Ma Y, Yang G (2016) F-CNN: an FPGA-based framework for training convolutional neural networks. In: Proceedings of the IEEE international conference on application-specific systems, architectures and processors, pp 107–114Google Scholar
  25. Zhu M, Liu L, Wang C, Xie Y (2016) Cnnlab: a novel parallel framework for neural networks using gpu and FPGA—a practical study with trade-off analysis. CoRR arXiv:1606.06234

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Xu Liu
    • 1
  • Hibat-Allah Ounifi
    • 2
  • Abdelouahed Gherbi
    • 2
    Email author
  • Wubin Li
    • 3
  • Mohamed Cheriet
    • 1
  1. 1.Synchromedia LaboratoryUniversity of Québec (ÉTS)MontréalCanada
  2. 2.University of Québec (ÉTS)MontréalCanada
  3. 3.Ericsson Research, EricssonMontréalCanada

Personalised recommendations