Skip to main content

Detector-in-Detector: Multi-level Analysis for Human-Parts

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11362))

Abstract

Vision-based person, hand or face detection approaches have achieved incredible success in recent years with the development of deep convolutional neural network (CNN). In this paper, we take the inherent correlation between the body and body parts into account and propose a new framework to boost up the detection performance of the multi-level objects. In particular, we adopt region-based object detection structure with two carefully designed detectors to separately pay attention to the human body and body parts in a coarse-to-fine manner, which we call Detector-in-Detector network (DID-Net). The first detector is designed to detect human body, hand and face. The second detector, based on the body detection results of the first detector, mainly focus on detection of small hand and face inside each body. The framework is trained in an end-to-end way by optimizing a multi-task loss. Due to the lack of human body, face and hand detection dataset, we have collected and labeled a new large dataset named Human-Parts with 14,962 images and 106,879 annotations. Experiments show that our method can achieve excellent performance on Human-Parts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ribeiro, D., Mateus, A., Nascimento, J.C., Miraldo, P.: A real-time pedestrian detector using deep learning for human-aware navigation (2016)

    Google Scholar 

  2. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation (2017)

    Google Scholar 

  3. Papandreou, G., et al.: Towards accurate multi-person pose estimation in the wild, pp. 3711–3719 (2017)

    Google Scholar 

  4. Xiao, S., et al.: Recurrent 3D–2D dual learning for large-pose facial landmark detection. In: IEEE International Conference on Computer Vision, pp. 1642–1651 (2017)

    Google Scholar 

  5. Dhawan, A., Honrao, V.: Implementation of hand detection based techniques for human computer interaction. Comput. Sci. 72, 6–13 (2013)

    Google Scholar 

  6. Ghorban, F., Marín, J., Yu, S., Colombo, A., Kummert, A.: Aggregated channels network for real-time pedestrian detection (2018)

    Google Scholar 

  7. Samangouei, P., Najibi, M., Davis, L., Chellappa, R.: Face-MagNet: magnifying feature maps to detect small faces (2018)

    Google Scholar 

  8. Zhu, C., Zheng, Y., Luu, K., Savvides, M.: CMS-RCNN: contextual multi-scale region-based CNN for unconstrained face detection (2017)

    Chapter  Google Scholar 

  9. Deng, X., et al.: Joint hand detection and rotation estimation by using CNN. IEEE Trans. Image Process. 27 (2016)

    Google Scholar 

  10. Le, T.H.N., Quach, K.G., Zhu, C., Chi, N.D., Luu, K., Savvides, M.: Robust hand detection and classification in vehicles and in the wild. In: Computer Vision and Pattern Recognition Workshops, pp. 1203–1210 (2017)

    Google Scholar 

  11. Mittal, A., Zisserman, A., Torr, P.: Hand detection using multiple proposals. In: British Machine Vision Conference, pp. 75.1–75.11 (2011)

    Google Scholar 

  12. Zhao, K., Zhang, W., Jiang, Y.: Semantic interactions in multi-level objects segmentation. In: International Conference on Computational and Information Sciences, pp. 665–668 (2010)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  14. Girshick, R.: Fast R-CNN. Comput. Sci. (2015)

    Google Scholar 

  15. Liu, W., et al.: SSD: single shot multibox detector, pp. 21–37 (2015)

    Chapter  Google Scholar 

  16. Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector (2017)

    Google Scholar 

  17. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection, pp. 779–788 (2015)

    Google Scholar 

  18. Jiang, H., Learnedmiller, E.: Face detection with the faster R-CNN, pp. 650–657 (2016)

    Google Scholar 

  19. He, K., Fu, Y., Xue, X.: A jointly learned deep architecture for facial attribute analysis and face detection in the wild (2017)

    Google Scholar 

  20. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  21. Hu, P., Ramanan, D.: Finding tiny faces (2016)

    Google Scholar 

  22. Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage headless face detector, pp. 4885–4894 (2017)

    Google Scholar 

  23. Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Computer Vision and Pattern Recognition, pp. 5325–5334 (2015)

    Google Scholar 

  24. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sign. Process. Lett. 23, 1499–1503 (2016)

    Article  Google Scholar 

  25. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  26. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN (2017)

    Google Scholar 

  27. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255 (2009)

    Google Scholar 

  28. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks (2016)

    Google Scholar 

  29. Girshick, R., Iandola, F., Darrell, T., Malik, J.: Deformable part models are convolutional neural networks, pp. 437–446 (2014)

    Google Scholar 

  30. Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: Proceedings Conference on Computer Vision Pattern Recognition, pp. 304–311 (2009)

    Google Scholar 

  31. Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection (2017)

    Google Scholar 

  32. Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: IEEE International Conference on Computer Vision (2016)

    Google Scholar 

  33. Jain, V., Learned-Miller, E.: FDDB: a benchmark for face detection in unconstrained settings (2010)

    Google Scholar 

  34. Yang, S., Luo, P., Chen, C.L., Tang, X.: WIDER FACE: a face detection benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5533 (2016)

    Google Scholar 

  35. Wu, J., et al.: AI challenger: a large-scale dataset for going deeper in image understanding (2017)

    Google Scholar 

  36. Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)

    Article  Google Scholar 

  37. Qin, H., Yan, J., Li, X., Hu, X.: Joint training of cascaded CNN for face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3456–3465 (2016)

    Google Scholar 

  38. Yang, S., Luo, P., Loy, C.C., Tang, X.: Faceness-Net: face detection through deep facial part responses. IEEE Trans. Pattern Anal. Mach. Intell. PP, 1 (2017)

    Google Scholar 

  39. Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining, pp. 761–769 (2016)

    Google Scholar 

  40. Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection, pp. 936–944 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fuqiang Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X., Yang, L., Song, Q., Zhou, F. (2019). Detector-in-Detector: Multi-level Analysis for Human-Parts. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11362. Springer, Cham. https://doi.org/10.1007/978-3-030-20890-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20890-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20889-9

  • Online ISBN: 978-3-030-20890-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics