An embedded implementation of CNN-based hand detection and orientation estimation algorithm

  • Li Yang
  • Zhi QiEmail author
  • Zeheng Liu
  • Hao Liu
  • Ming Ling
  • Longxing Shi
  • Xinning Liu
Original Paper


Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.


Hand detection Hand orientation estimation Convolutional neural network (CNN) Embedded implementation 



This research was supported by the Provincial Natural Science Foundation of Jiangsu Province (Grant No. BK20181141), Key Science and Technology Projects in Jiangsu Province (Grant No. BE2018002-2), and the National Science and Technology Major Project (Grant No. 2017-ZX01030101).


  1. 1.
    Argyros, A.A., Lourakis, M.I.: Real-time tracking of multiple skin-colored objects with a possibly moving camera. In: European Conference on Computer Vision, pp. 368–379. Springer (2004)Google Scholar
  2. 2.
    Chen, Q., Georganas, N.D., Petriu, E.M.: Hand gesture recognition using haar-like features and a stochastic context-free grammar. IEEE Trans. Instrum. Meas. 57(8), 1562–1571 (2008)CrossRefGoogle Scholar
  3. 3.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16. Curran Associates Inc., Barcelona, Spain, pp 379–387 (2016)Google Scholar
  4. 4.
    Deng, X., Zhang, Y., Yang, S., Tan, P., Chang, L., Yuan, Y., Wang, H.: Joint hand detection and rotation estimation using CNN. IEEE Trans. Image Process. 27(4), 1888–1900 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  6. 6.
    Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
  7. 7.
    Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  8. 8.
    Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR, vol. 4 (2017)Google Scholar
  9. 9.
    Huang, Y., Liu, X., Zhang, X., Jin, L.: A pointing gesture based egocentric interaction system: dataset, approach and application. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–23 (2016)Google Scholar
  10. 10.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)Google Scholar
  11. 11.
    Jones, M., Viola, P.: Robust real-time object detection. Int. J. Comput. Vis. 57(2), 87 (2002)Google Scholar
  12. 12.
    Le, T.H.N., Quach, K.G., Zhu, C., Duong, C.N., Luu, K., Savvides, M., Center, C.B.: Robust hand detection and classification in vehicles and in the wild. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1203–1210 (2017)Google Scholar
  13. 13.
    Le, T.H.N., Zhu, C., Zheng, Y., Luu, K., Savvides, M.: Robust hand detection in vehicles. In: 23rd International Conference on Pattern Recognition (ICPR), pp. 573–578. IEEE (2016)Google Scholar
  14. 14.
    Li, C., Kitani, K.M.: Pixel-level hand detection in ego-centric videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570–3577 (2013)Google Scholar
  15. 15.
    Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017)Google Scholar
  16. 16.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)Google Scholar
  17. 17.
    Mao, H., Yao, S., Tang, T., Li, B., Yao, J., Wang, Y.: Towards real-time object detection on embedded systems. IEEE Trans. Emerg. Top. Comput. 1, 1–1 (2016)Google Scholar
  18. 18.
    Mittal, A., Zisserman, A., Torr, P.H.: Hand detection using multiple proposals. In: BMVC, pp. 1–11. Citeseer (2011)Google Scholar
  19. 19.
    Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS-W (2017)Google Scholar
  20. 20.
    Pisharady, P.K., Vadakkepat, P., Loh, A.P.: Attention based detection and recognition of hand postures against complex backgrounds. Int. J. Comput. Vis. 101(3), 403–419 (2013)CrossRefGoogle Scholar
  21. 21.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, vol 1. MIT Press, Montreal, Canada, pp 91–99 (2015)Google Scholar
  22. 22.
    Shrivastava, A., Sukthankar, R., Malik, J., Gupta, A.: Beyond skip connections: Top-down modulation for object detection. arXiv preprint arXiv:1612.06851 (2016)
  23. 23.
    Stergiopoulou, E., Sgouropoulos, K., Nikolaou, N., Papamarkos, N., Mitianoudis, N.: Real time hand detection in a complex background. Eng. Appl. Artif. Intell. 35, 54–70 (2014)CrossRefGoogle Scholar
  24. 24.
    Wang, C., Wang, Y., Han, Y., Song, L., Quan, Z., Li, J., Li, X.: CNN-based object detection solutions for embedded heterogeneous multicore SoCs. In: 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 105–110. IEEE (2017)Google Scholar
  25. 25.
    Yu, J., Guo, K., Hu, Y., Ning, X., Qiu, J., Mao, H., Yao, S., Tang, T., Li, B., Wang, Y., et al.: Real-time object detection towards high power efficiency. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 704–708. IEEE (2018)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Li Yang
    • 1
  • Zhi Qi
    • 1
    Email author
  • Zeheng Liu
    • 1
  • Hao Liu
    • 1
  • Ming Ling
    • 1
  • Longxing Shi
    • 1
  • Xinning Liu
    • 1
  1. 1.National ASIC System Engineering Research CenterSoutheast UniversityNanjingChina

Personalised recommendations