Abstract
Convolutional Neural Networks (CNNs) have a broad range of applications, such as image processing and natural language processing. Inspired by the mammalian visual cortex, CNNs have been shown to achieve impressive results on a number of computer vision challenges, but often with large amounts of processing power and no timing restrictions. This paper presents a design methodology for accelerating CNNs using Hardware/Software Co-design techniques, in order to balance performance and flexibility, particularly for resource-constrained systems. The methodology is applied to a gender recognition case study, using an ARM processor and FPGA fabric to create an embedded system that can process facial images in real-time.
Similar content being viewed by others
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) TensorFlow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI)
Tivive FHC, Bouzerdoum A (2006) A gender recognition system using shunting inhibitory convolutional neural networks. In: International joint conference on neural networks (IJCNN), pp 5336–5341
Chen ATY, Biglari-Abhari M, Wang KIK, Bouzerdoum A, Tivive FHC (2016) Hardware/software co-design for a gender recognition embedded system. In: Trends in applied knowledge-based systems and data science, vol 9799, pp 541–552
de Michell G, Gupta RK (1997) Hardware/software co-design. Proc IEEE 85(3):349–365
Teich J (2012) Hardware/software codesign: the past, the present, and predicting the future. Proc IEEE 100:1411–1430
Alt N, Clause C, Stechele W (2008) Hardware/software architecture of an algorithm for vision-based real-time vehicle detection in dark environments. In: Design, automation, and test in europe (DATE), pp 176–181
van der Wal G, Zhang D, Kandaswamy I, Marakowitz J, Kaighn K, Zhang J, Chai S (2015) FPGA acceleration for feature based processing applications. In: Conference on computer vision and pattern recognition (CVPR), pp 42–47
Tasson D, Montagnini A, Marzotto R, Farenzena M (2015) FPGA-based pedestrian detection under strong distortions. In: Conference on computer vision and pattern recognition (CVPR), pp 65–70
Farabet C, Poulet C, Han JY, LeCun Y (2009) CNP: An FPGA-based processor for convolutional networks. In: International conference on field programmable logic (FPL), pp 32–37
Sankaradas M, Jakkula V, Cadambi S, Chakradhar S, Durdanovic I, Cosatto E, Graf HP (2009) A massively parallel coprocessor for convolutional neural networks. In: 20th international conference on application-specific systems, architectures, and processors (ASAP), pp 53–60
Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y (2011) NeuFlow: a runtime reconfigurable dataflow processor for vision. In: Conference on computer vision and pattern recognition workshops (CVPR), pp 109–116
Cavigelli L, Gschwend D, Mayer C, Willi S, Muheim B, Benini L (2015) Origami: a convolutional network accelerator. In: 25th great lakes symposium on VLSI (GLSVLSI), pp 199–204
Pham PH, Jelaca D, Farabet C, Martini B, LeCun Y, Culurciello E (2012) NeuFlow: dataflow vision processing system-on-a-chip. In: 55th midwest symposium on circuits and systems (MWSCAS), pp 1044–1047
Li X, Areibi S (2004) A hardware/software co-design approach for face recognition. In: 16th international conference on microelectronics (ICM), pp 55–58
Che M, Chang Y (2010) A hardware/software co-design of a face detection algorithm based on FPGA. In: International conference on measuring technology and mechatronics automation (ICMTMA), pp 109–112
Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, Wang Y, Yang H (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: International symposium on field-programmable gate arrays (FPGA), pp 26–35
Maclean WJ (2005) An evaluation of the suitability of FPGAs for embedded vision systems. In: Conference on computer vision and pattern recognition workshops (CVPR), pp 131–138
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: International symposium on field-programmable gate arrays (FPGA), pp 161–170
Gupta S, Agrawal A, Gopalakrishnan K (2015) Deep learning with limited numerical precision. In: 32nd international conference on machine learning (ICML), pp 1737–1746
Ng CB, Tay YH, Goi BM (2012) Recognizing human gender in computer vision: a survey. In: Pacific rim international conference on artificial intelligence: trends in artificial intelligence (PRICAI), pp 335–346
Zheng J, Lu B (2011) A support vector machine classifier with automatic confidence. Neurocomputing 74(11):1926–1935
Shan C (2012) Learning local binary patterns for gender classification on real-world face images. Pattern Recogn Lett 4(33):431–437
Azarmehr R, Laganiere R, Lee WS, Xu C, Laroche D (2015) Real-time embedded age and gender classification in unconstrained video. In: Conference on computer vision and pattern recognition workshops (CVPR), pp 56–64
Irick KM, DeBole M, Narayanan V, Gayasen A (2008) A hardware efficient support vector machine architecture for FPGA. In: 16th international symposium on field-programmable custom computing machines (FCCM), pp 304–305
Irick K, DeBole M, Narayanan V, Sharma R, Moon H, Mummareddy S (2007) A unified streaming architecture for real time face detection and gender classification. In: international conference on field programmable logic and applications (FPL), pp 267–272
Ratnakar A, More G (2015) Real time gender recognition on FPGA. Int J Sci Eng Res 6(2):19–22
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Conference on computer vision and pattern recognition (CVPR), pp 779–788
Tivive FHC, Bouzerdoum A, Phung SL, Iftekharuddin KM (2010) Adaptive hierarchical architecture for visual recognition. Appl Opt 49(10):B1–B8
Fogel I, Sagi D (1989) Gabor filters as texture discriminator. Biol Cybern 61(2):103–113
Wu J, An G, Ruan Q (2009) Independent Gabor analysis of discriminant features fusion for face recognition. IEEE Signal Processing Lett 16(2):97–100
Li W, Du Q (2014) Gabor-filtering-based nearest regularized subspace for hyperspectral image classification. IEEE J Select Topics Appl Earth Observ Rem Sens 7(4):1012–1022
Jones JP, Palmer L (1987) An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J Neurophys 58(6):1233–1258
Daugman JG (1985) Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J Optic Soc Amer A: Optic Image Sci Vis 2(7):1160–1169
Naka KI, Rushton WAH (1966) S-potentials from colour units in the retina of fish (Cyprinidae). J Phys 185:536–555
Hagan MT, Menhaj M (1994) Training feedforward networks with the marquardt algorithm. IEEE Trans Neural Networks 5(6):989–993
Cesur E, Yildiz N, Tavsanoglu V (2012) On an improved FPGA implementation of CNN-based Gabor-type filters. IEEE Trans Circuits Systems 59(11):815–819
Pauwels K, Tomasi M, Alonso JD, Ros E, van Hulle MM (2012) A comparison of FPGA and GPU for real-time phase-based optical flow, stereo, and local image features. IEEE Trans Comput 61(7):999–1012
Han S, Mao H, Dally WJ (2016) Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding. In: International conference on learning representations (ICLR)
Chen Y, Xu W, Zhao R, Chen X (2014) Design and evaluation of a hardware/software FPGA-based system for fast image processing. Photonic Sensors 4(3):274–280
Gudis E, Lu P, Berends D, Kaighn K, van der Wal G, Buchanan G, Chai S, Piacentino M (2013) An embedded vision services framework for heterogeneous accelerators. In: conference on computer vision and pattern recognition workshops (CVPR), pp 598–603
Albericio J, Judd P, Hetherington T, Aamodt T, Jerger NE, Moshovos A (2016) Cnvlutin: ineffectual-neuron-free deep neural network computing. In: 43rd international symposium on comparative archives (ISCA), pp 1–13
Jesorsky O, Kirchberg KJ, Frischholz RW (2001) Robust face detection using the Hausdorff distance. In: 3rd international conference on audio- and video-based biometric person authentication (AVBPA), pp 90–95
Pantic M, Valstar M, Rademaker R (2005) Web-based database for facial expression analysis. In: International conference on multimedia and expo (ICME), pp. 317–321
Phillips PJ, Moon H, Rauss PJ, Rizvi S (2000) The FERET evaluation methodology for face recognition algorithms. IEEE Trans Pattern Anal Machine Intelligence 22(10):1090–1104
Thomaz CE, Giraldi GA (2010) A new ranking method for principal components analysis and its application to face image analysis. Image Vis Comput 28(6):902–913
Lee PH, Hung JY, Hung YP (2010) Automatic gender recognition using fusion of facial strips. In: 20th international conference on pattern recognition, pp 1140–1143
Leng XM, Wang YD (2008) Improving generalization for gender classification. In: 15th international conference on image processing, pp 1656–1659
Moghaddam B, Yang MH (2002) Learning gender with support faces. IEEE Trans Pattern Anal Machine Intelligence 24(5):707–711
Lu L, Shi P (2009) A novel fusion-based method for expression-invariant gender classification. In: International conference on acoustics, speech, and signal processing, pp 1065–1068
Baluja S, Rowley HA (2007) Boosting sex identification performance. Int J Comp Vision 71(1):111–119
Buchala S, Loomes MJ, Davey N, Frank RJ (2005) The role of global and feature based information in gender classification of faces: a comparison of human performance and computational models. Int J Neural Syst 15:121–128
Sahin I, Saritekin NK (2016) A data path design tool for automatically mapping artificial neural networks on to FPGA-based systems. J Elec Eng Tech 11(5):1921–1929
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, AY., Biglari-Abhari, M., Wang, KK. et al. Convolutional neural network acceleration with hardware/software co-design. Appl Intell 48, 1288–1301 (2018). https://doi.org/10.1007/s10489-017-1007-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-1007-z