Detector-in-Detector: Multi-level Analysis for Human-Parts

Li, Xiaojie; Yang, Lu; Song, Qing; Zhou, Fuqiang

doi:10.1007/978-3-030-20890-5_15

Detector-in-Detector: Multi-level Analysis for Human-Parts

Conference paper
First Online: 02 June 2019

2064 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11362))

Abstract

Vision-based person, hand or face detection approaches have achieved incredible success in recent years with the development of deep convolutional neural network (CNN). In this paper, we take the inherent correlation between the body and body parts into account and propose a new framework to boost up the detection performance of the multi-level objects. In particular, we adopt region-based object detection structure with two carefully designed detectors to separately pay attention to the human body and body parts in a coarse-to-fine manner, which we call Detector-in-Detector network (DID-Net). The first detector is designed to detect human body, hand and face. The second detector, based on the body detection results of the first detector, mainly focus on detection of small hand and face inside each body. The framework is trained in an end-to-end way by optimizing a multi-task loss. Due to the lack of human body, face and hand detection dataset, we have collected and labeled a new large dataset named Human-Parts with 14,962 images and 106,879 annotations. Experiments show that our method can achieve excellent performance on Human-Parts.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ribeiro, D., Mateus, A., Nascimento, J.C., Miraldo, P.: A real-time pedestrian detector using deep learning for human-aware navigation (2016)
Google Scholar
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation (2017)
Google Scholar
Papandreou, G., et al.: Towards accurate multi-person pose estimation in the wild, pp. 3711–3719 (2017)
Google Scholar
Xiao, S., et al.: Recurrent 3D–2D dual learning for large-pose facial landmark detection. In: IEEE International Conference on Computer Vision, pp. 1642–1651 (2017)
Google Scholar
Dhawan, A., Honrao, V.: Implementation of hand detection based techniques for human computer interaction. Comput. Sci. 72, 6–13 (2013)
Google Scholar
Ghorban, F., Marín, J., Yu, S., Colombo, A., Kummert, A.: Aggregated channels network for real-time pedestrian detection (2018)
Google Scholar
Samangouei, P., Najibi, M., Davis, L., Chellappa, R.: Face-MagNet: magnifying feature maps to detect small faces (2018)
Google Scholar
Zhu, C., Zheng, Y., Luu, K., Savvides, M.: CMS-RCNN: contextual multi-scale region-based CNN for unconstrained face detection (2017)
Chapter Google Scholar
Deng, X., et al.: Joint hand detection and rotation estimation by using CNN. IEEE Trans. Image Process. 27 (2016)
Google Scholar
Le, T.H.N., Quach, K.G., Zhu, C., Chi, N.D., Luu, K., Savvides, M.: Robust hand detection and classification in vehicles and in the wild. In: Computer Vision and Pattern Recognition Workshops, pp. 1203–1210 (2017)
Google Scholar
Mittal, A., Zisserman, A., Torr, P.: Hand detection using multiple proposals. In: British Machine Vision Conference, pp. 75.1–75.11 (2011)
Google Scholar
Zhao, K., Zhang, W., Jiang, Y.: Semantic interactions in multi-level objects segmentation. In: International Conference on Computational and Information Sciences, pp. 665–668 (2010)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Girshick, R.: Fast R-CNN. Comput. Sci. (2015)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector, pp. 21–37 (2015)
Chapter Google Scholar
Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector (2017)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection, pp. 779–788 (2015)
Google Scholar
Jiang, H., Learnedmiller, E.: Face detection with the faster R-CNN, pp. 650–657 (2016)
Google Scholar
He, K., Fu, Y., Xue, X.: A jointly learned deep architecture for facial attribute analysis and face detection in the wild (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Hu, P., Ramanan, D.: Finding tiny faces (2016)
Google Scholar
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage headless face detector, pp. 4885–4894 (2017)
Google Scholar
Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Computer Vision and Pattern Recognition, pp. 5325–5334 (2015)
Google Scholar
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sign. Process. Lett. 23, 1499–1503 (2016)
Article Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255 (2009)
Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks (2016)
Google Scholar
Girshick, R., Iandola, F., Darrell, T., Malik, J.: Deformable part models are convolutional neural networks, pp. 437–446 (2014)
Google Scholar
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: Proceedings Conference on Computer Vision Pattern Recognition, pp. 304–311 (2009)
Google Scholar
Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection (2017)
Google Scholar
Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: IEEE International Conference on Computer Vision (2016)
Google Scholar
Jain, V., Learned-Miller, E.: FDDB: a benchmark for face detection in unconstrained settings (2010)
Google Scholar
Yang, S., Luo, P., Chen, C.L., Tang, X.: WIDER FACE: a face detection benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5533 (2016)
Google Scholar
Wu, J., et al.: AI challenger: a large-scale dataset for going deeper in image understanding (2017)
Google Scholar
Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)
Article Google Scholar
Qin, H., Yan, J., Li, X., Hu, X.: Joint training of cascaded CNN for face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3456–3465 (2016)
Google Scholar
Yang, S., Luo, P., Loy, C.C., Tang, X.: Faceness-Net: face detection through deep facial part responses. IEEE Trans. Pattern Anal. Mach. Intell. PP, 1 (2017)
Google Scholar
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining, pp. 761–769 (2016)
Google Scholar
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection, pp. 936–944 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Beihang University, Beijing, 100191, China
Xiaojie Li & Fuqiang Zhou
Beijing University of Posts and Telecommunications, Beijing, 100876, China
Lu Yang & Qing Song

Authors

Xiaojie Li
View author publications
You can also search for this author in PubMed Google Scholar
Lu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Song
View author publications
You can also search for this author in PubMed Google Scholar
Fuqiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fuqiang Zhou .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C. V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Yang, L., Song, Q., Zhou, F. (2019). Detector-in-Detector: Multi-level Analysis for Human-Parts. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11362. Springer, Cham. https://doi.org/10.1007/978-3-030-20890-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-20890-5_15
Published: 02 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20889-9
Online ISBN: 978-3-030-20890-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics