Skip to main content
Log in

Convolutional neural network acceleration with hardware/software co-design

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Convolutional Neural Networks (CNNs) have a broad range of applications, such as image processing and natural language processing. Inspired by the mammalian visual cortex, CNNs have been shown to achieve impressive results on a number of computer vision challenges, but often with large amounts of processing power and no timing restrictions. This paper presents a design methodology for accelerating CNNs using Hardware/Software Co-design techniques, in order to balance performance and flexibility, particularly for resource-constrained systems. The methodology is applied to a gender recognition case study, using an ARM processor and FPGA fabric to create an embedded system that can process facial images in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) TensorFlow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI)

    Google Scholar 

  2. Tivive FHC, Bouzerdoum A (2006) A gender recognition system using shunting inhibitory convolutional neural networks. In: International joint conference on neural networks (IJCNN), pp 5336–5341

    Google Scholar 

  3. Chen ATY, Biglari-Abhari M, Wang KIK, Bouzerdoum A, Tivive FHC (2016) Hardware/software co-design for a gender recognition embedded system. In: Trends in applied knowledge-based systems and data science, vol 9799, pp 541–552

    Google Scholar 

  4. de Michell G, Gupta RK (1997) Hardware/software co-design. Proc IEEE 85(3):349–365

    Article  Google Scholar 

  5. Teich J (2012) Hardware/software codesign: the past, the present, and predicting the future. Proc IEEE 100:1411–1430

    Article  Google Scholar 

  6. Alt N, Clause C, Stechele W (2008) Hardware/software architecture of an algorithm for vision-based real-time vehicle detection in dark environments. In: Design, automation, and test in europe (DATE), pp 176–181

  7. van der Wal G, Zhang D, Kandaswamy I, Marakowitz J, Kaighn K, Zhang J, Chai S (2015) FPGA acceleration for feature based processing applications. In: Conference on computer vision and pattern recognition (CVPR), pp 42–47

  8. Tasson D, Montagnini A, Marzotto R, Farenzena M (2015) FPGA-based pedestrian detection under strong distortions. In: Conference on computer vision and pattern recognition (CVPR), pp 65–70

  9. Farabet C, Poulet C, Han JY, LeCun Y (2009) CNP: An FPGA-based processor for convolutional networks. In: International conference on field programmable logic (FPL), pp 32–37

  10. Sankaradas M, Jakkula V, Cadambi S, Chakradhar S, Durdanovic I, Cosatto E, Graf HP (2009) A massively parallel coprocessor for convolutional neural networks. In: 20th international conference on application-specific systems, architectures, and processors (ASAP), pp 53–60

    Google Scholar 

  11. Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y (2011) NeuFlow: a runtime reconfigurable dataflow processor for vision. In: Conference on computer vision and pattern recognition workshops (CVPR), pp 109–116

    Google Scholar 

  12. Cavigelli L, Gschwend D, Mayer C, Willi S, Muheim B, Benini L (2015) Origami: a convolutional network accelerator. In: 25th great lakes symposium on VLSI (GLSVLSI), pp 199–204

  13. Pham PH, Jelaca D, Farabet C, Martini B, LeCun Y, Culurciello E (2012) NeuFlow: dataflow vision processing system-on-a-chip. In: 55th midwest symposium on circuits and systems (MWSCAS), pp 1044–1047

  14. Li X, Areibi S (2004) A hardware/software co-design approach for face recognition. In: 16th international conference on microelectronics (ICM), pp 55–58

  15. Che M, Chang Y (2010) A hardware/software co-design of a face detection algorithm based on FPGA. In: International conference on measuring technology and mechatronics automation (ICMTMA), pp 109–112

  16. Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, Wang Y, Yang H (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: International symposium on field-programmable gate arrays (FPGA), pp 26–35

  17. Maclean WJ (2005) An evaluation of the suitability of FPGAs for embedded vision systems. In: Conference on computer vision and pattern recognition workshops (CVPR), pp 131–138

  18. Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: International symposium on field-programmable gate arrays (FPGA), pp 161–170

  19. Gupta S, Agrawal A, Gopalakrishnan K (2015) Deep learning with limited numerical precision. In: 32nd international conference on machine learning (ICML), pp 1737–1746

  20. Ng CB, Tay YH, Goi BM (2012) Recognizing human gender in computer vision: a survey. In: Pacific rim international conference on artificial intelligence: trends in artificial intelligence (PRICAI), pp 335–346

    Google Scholar 

  21. Zheng J, Lu B (2011) A support vector machine classifier with automatic confidence. Neurocomputing 74(11):1926–1935

    Article  Google Scholar 

  22. Shan C (2012) Learning local binary patterns for gender classification on real-world face images. Pattern Recogn Lett 4(33):431–437

    Article  Google Scholar 

  23. Azarmehr R, Laganiere R, Lee WS, Xu C, Laroche D (2015) Real-time embedded age and gender classification in unconstrained video. In: Conference on computer vision and pattern recognition workshops (CVPR), pp 56–64

  24. Irick KM, DeBole M, Narayanan V, Gayasen A (2008) A hardware efficient support vector machine architecture for FPGA. In: 16th international symposium on field-programmable custom computing machines (FCCM), pp 304–305

  25. Irick K, DeBole M, Narayanan V, Sharma R, Moon H, Mummareddy S (2007) A unified streaming architecture for real time face detection and gender classification. In: international conference on field programmable logic and applications (FPL), pp 267–272

  26. Ratnakar A, More G (2015) Real time gender recognition on FPGA. Int J Sci Eng Res 6(2):19–22

    Google Scholar 

  27. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Conference on computer vision and pattern recognition (CVPR), pp 779–788

  28. Tivive FHC, Bouzerdoum A, Phung SL, Iftekharuddin KM (2010) Adaptive hierarchical architecture for visual recognition. Appl Opt 49(10):B1–B8

    Article  Google Scholar 

  29. Fogel I, Sagi D (1989) Gabor filters as texture discriminator. Biol Cybern 61(2):103–113

    Article  Google Scholar 

  30. Wu J, An G, Ruan Q (2009) Independent Gabor analysis of discriminant features fusion for face recognition. IEEE Signal Processing Lett 16(2):97–100

    Article  Google Scholar 

  31. Li W, Du Q (2014) Gabor-filtering-based nearest regularized subspace for hyperspectral image classification. IEEE J Select Topics Appl Earth Observ Rem Sens 7(4):1012–1022

    Article  Google Scholar 

  32. Jones JP, Palmer L (1987) An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J Neurophys 58(6):1233–1258

    Article  Google Scholar 

  33. Daugman JG (1985) Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J Optic Soc Amer A: Optic Image Sci Vis 2(7):1160–1169

    Article  Google Scholar 

  34. Naka KI, Rushton WAH (1966) S-potentials from colour units in the retina of fish (Cyprinidae). J Phys 185:536–555

    Google Scholar 

  35. Hagan MT, Menhaj M (1994) Training feedforward networks with the marquardt algorithm. IEEE Trans Neural Networks 5(6):989–993

    Article  Google Scholar 

  36. Cesur E, Yildiz N, Tavsanoglu V (2012) On an improved FPGA implementation of CNN-based Gabor-type filters. IEEE Trans Circuits Systems 59(11):815–819

    Google Scholar 

  37. Pauwels K, Tomasi M, Alonso JD, Ros E, van Hulle MM (2012) A comparison of FPGA and GPU for real-time phase-based optical flow, stereo, and local image features. IEEE Trans Comput 61(7):999–1012

    Article  MathSciNet  MATH  Google Scholar 

  38. Han S, Mao H, Dally WJ (2016) Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding. In: International conference on learning representations (ICLR)

    Google Scholar 

  39. Chen Y, Xu W, Zhao R, Chen X (2014) Design and evaluation of a hardware/software FPGA-based system for fast image processing. Photonic Sensors 4(3):274–280

    Article  Google Scholar 

  40. Gudis E, Lu P, Berends D, Kaighn K, van der Wal G, Buchanan G, Chai S, Piacentino M (2013) An embedded vision services framework for heterogeneous accelerators. In: conference on computer vision and pattern recognition workshops (CVPR), pp 598–603

  41. Albericio J, Judd P, Hetherington T, Aamodt T, Jerger NE, Moshovos A (2016) Cnvlutin: ineffectual-neuron-free deep neural network computing. In: 43rd international symposium on comparative archives (ISCA), pp 1–13

  42. Jesorsky O, Kirchberg KJ, Frischholz RW (2001) Robust face detection using the Hausdorff distance. In: 3rd international conference on audio- and video-based biometric person authentication (AVBPA), pp 90–95

  43. Pantic M, Valstar M, Rademaker R (2005) Web-based database for facial expression analysis. In: International conference on multimedia and expo (ICME), pp. 317–321

  44. Phillips PJ, Moon H, Rauss PJ, Rizvi S (2000) The FERET evaluation methodology for face recognition algorithms. IEEE Trans Pattern Anal Machine Intelligence 22(10):1090–1104

    Article  Google Scholar 

  45. Thomaz CE, Giraldi GA (2010) A new ranking method for principal components analysis and its application to face image analysis. Image Vis Comput 28(6):902–913

    Article  Google Scholar 

  46. Lee PH, Hung JY, Hung YP (2010) Automatic gender recognition using fusion of facial strips. In: 20th international conference on pattern recognition, pp 1140–1143

  47. Leng XM, Wang YD (2008) Improving generalization for gender classification. In: 15th international conference on image processing, pp 1656–1659

  48. Moghaddam B, Yang MH (2002) Learning gender with support faces. IEEE Trans Pattern Anal Machine Intelligence 24(5):707–711

    Article  Google Scholar 

  49. Lu L, Shi P (2009) A novel fusion-based method for expression-invariant gender classification. In: International conference on acoustics, speech, and signal processing, pp 1065–1068

    Google Scholar 

  50. Baluja S, Rowley HA (2007) Boosting sex identification performance. Int J Comp Vision 71(1):111–119

    Article  Google Scholar 

  51. Buchala S, Loomes MJ, Davey N, Frank RJ (2005) The role of global and feature based information in gender classification of faces: a comparison of human performance and computational models. Int J Neural Syst 15:121–128

    Article  Google Scholar 

  52. Sahin I, Saritekin NK (2016) A data path design tool for automatically mapping artificial neural networks on to FPGA-based systems. J Elec Eng Tech 11(5):1921–1929

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Tzer-Yeu Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, AY., Biglari-Abhari, M., Wang, KK. et al. Convolutional neural network acceleration with hardware/software co-design. Appl Intell 48, 1288–1301 (2018). https://doi.org/10.1007/s10489-017-1007-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-017-1007-z

Keywords

Navigation