Multiple human upper bodies detection via candidate-region convolutional neural network

Zhu, Aichun; Wang, Tian; Qiao, Tong

doi:10.1007/s11042-018-6964-7

Multiple human upper bodies detection via candidate-region convolutional neural network

Published: 13 December 2018

Volume 78, pages 16077–16096, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

261 Accesses
5 Citations
Explore all metrics

Abstract

Upper body detection on images is a challenging task in practical application scenarios and shares all the difficulties of object detection. This paper focuses on the problems of the multiple upper bodies, including the diversity of appearances, the various object scales, and the frequent occlusions. To address these problems, we divide the upper body detection into two stages to form a Candidate-Region Convolutional Neural Network(CR-CNN). In the upper body candidate generation stage, a deep hierarchical model is proposed. This model is built by a graphical model that contains the appearance model and deformable model. The appearance model is built based on the feature maps in a CNN, and the deformable model is defined by each pair of connected parts to compute the relative spatial information in the graphical model. In the upper body candidate refining stage, the detected bounding boxes serve as the candidate regions and refined in the CR-CNN. Moreover, multiple convolutional features are introduced into the CR-CNN to provide the local information and contextual information. The proposed method is compared with the state of the art on the TV Human Interaction (TVHI) dataset and HollywoodHeads dataset. The experimental results demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detector-in-Detector: Multi-level Analysis for Human-Parts

Multi-Human Pose Estimation by Deep Learning-Based Sequential Approach for Human Keypoint Position and Human Body Detection

Article 28 October 2023

Rizwan Tahir & Yunze Cai

Human Pose Estimation via Deep Part Detection

References

Andriluka M, Roth S, Schiele B (2010) Monocular 3d pose estimation and tracking by detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 623–630
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
MATH Google Scholar
Chen B, Yang Z, Huang S, Du X, Cui Z, Bhimani J, Xie X, Mi N (2017) Cyber-physical system enabled nearby traffic flow modelling for autonomous vehicles. In: IEEE International PERFORMANCE computing and communications conference, pp 1–6
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol 1, pp 886–893. https://doi.org/10.1109/CVPR.2005.177
Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) Imagenet: a large-scale hierarchical image database. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009., pp 248–255
Ding M, Fan G (2015) Articulated and generalized gaussian kernel correlation for human pose estimation. IEEE Trans Image Process 25(2):776–789
Article MathSciNet MATH Google Scholar
Ding M, Fan G (2015) Multilayer joint gait-pose manifolds for human gait motion modeling. IEEE Trans Cybern 45(11):1–8
Article Google Scholar
Ding X, Xu H, Cui P, Sun L (2009) A cascade svm approach for head-shoulder detection using histograms of oriented gradients. In: IEEE International symposium on circuits and systems, pp 1791–1794
Duan K, Batra D, Crandall DJ (2012) A multi-layer composite model for human pose estimation. In: BMVC, pp 1–11
Everingham M, Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Fan X, Zheng K, Lin Y, Wang S (2015) Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Fang Z, Fei F, Fang Y, Lee C, Xiong N, Shu L, Chen S (2016) Abnormal event detection in crowded scenes based on deep learning. Multimed Tools Appl 75(22):1–23
Article Google Scholar
Felzenszwalb PF, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79. https://doi.org/10.1023/B:VISI.0000042934.15159.49
Article Google Scholar
Fischler MA, Elschlager RA (1973) The representation and matching of pictorial structures. IEEE Transactions on computers 22(1):67–92
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Girshick R (2015) Fast r-cnn. In: International conference on computer vision (ICCV)
Glauner PO (2015) Deep convolutional neural networks for smile recognition. arXiv:1508.06535
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Computer vision–ECCV 2014. Springer, pp 346–361
Hoai M, Zisserman A (2014) Talking heads: Detecting humans and recognizing their interactions. In: IEEE Computer vision and pattern recognition
Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition?. In: Proceedings of International conference on computer vision (ICCV’09). IEEE
Jiang H, Martin D (2008) Global pose estimation using non-tree models. In: 2008. CVPR 2008. IEEE conference on Computer vision and pattern recognition, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587457
Karpagavalli P, Ramprasad AV (2016) An adaptive hybrid gmm for multiple human detection in crowd scenario. Multimedia Tools & Applications 76(12):1–21
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kumar M, Zisserman A, Torr P (2009) Efficient discriminative learning of parts-based models. In: Proceedings of International Conference on Computer Vision (ICCV), pp 552–559. https://doi.org/10.1109/ICCV.2009.5459192
LeCun Y, Huang FJ, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol 2, pp II–97–104. https://doi.org/10.1109/CVPR.2004.1315150
Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09. ACM, pp 609–616. https://doi.org/10.1145/1553374.1553453
Li M, Zhang Z, Huang K, Tan T (2009) Rapid and robust human detection and tracking based on omega-shape features, pp 2545–2548
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2015) You only look once: Unified, real-time object detection, pp 779–788
Redmon J, Farhadi A (2016) Yolo9000: Better, faster, stronger, pp 6517–6525
Liu Y, Wu Q, Tang L, Shi H (2017) Gaze-assisted multi-stream deep neural network for action recognition. IEEE Access PP (99):1–1. https://doi.org/10.1109/ACCESS.2017.2753830
Google Scholar
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. arXiv:1411.4038
Lowe DG (1999) Object recognition from local scale-invariant features. In: 1999. The proceedings of the seventh IEEE international conference on Computer vision. IEEE, vol 2, pp 1150–1157
Meng C, Zhao X (2017) Webcam-based eye movement analysis using cnn. IEEE Access PP(99):1–1. https://doi.org/10.1109/ACCESS.2017.2754299
Article Google Scholar
Patron-Perez A, Marszalek M, Reid I, Zisserman A (2012) Structured learning of human interactions in tv shows. IEEE Trans Pattern Anal Mach Intell 34(12):2441–53
Article Google Scholar
Ren S, Girshick R, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Roh MC, Lee JY (2017) Refining faster-rcnn for accurate object detection. In: Fifteenth iapr international conference on machine vision applications
Sapp B, Toshev A, Taskar B (2010) Cascaded models for articulated pose estimation. In: Proceedings of European Conference on Computer Vision (ECCV), ECCV’10. Springer-Verlag, Berlin, pp 406–420. http://dl.acm.org/citation.cfm?id=1888028.1888060
Sapp B, Taskar B (2013) Modec: Multimodal decomposable models for human pose estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3674–3681. https://doi.org/10.1109/CVPR.2013.471. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6619315
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: Integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (ICLR 2014). CBLS. http://openreview.net/document/d332e77d-459a-4af8-b3ed-55ba
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science
Sun M, Savarese S (2011) Articulated part-based model for joint object detection and pose estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), ICCV ’11. IEEE Computer Society, Washington, pp 723–730. https://doi.org/10.1109/ICCV.2011.6126309
Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Burges C, Bottou L, Welling M, Ghahramani Z, Weinberger K (eds) Advances in Neural Information Processing Systems, vol 26, pp 2553–2561
Tian TP, Sclaroff S (2010) Fast globally optimal 2d human detection with loopy graph models. In: 2010 IEEE conference on Computer vision and pattern recognition (CVPR), pp 81–88. https://doi.org/10.1109/CVPR.2010.5540227
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the twenty-first international conference on Machine learning. ACM, pp 104
Uijlings J, van de Sande K, Gevers T, Smeulders A (2013) Selective search for object recognition. In: International Journal of Computer Vision. Springer, US, vol 104, pp 154–171. https://doi.org/10.1007/s11263-013-0620-5
Wang F, Li Y (2013) Beyond physical connections: Tree models in human pose estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 596–603. https://doi.org/10.1109/CVPR.2013.83. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6618927
Wang Y, Tran D, Liao Z (2011) Learning hierarchical poselets for human parsing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’11. IEEE Computer Society, Washington, pp 1705–1712. https://doi.org/10.1109/CVPR.2011.5995519
Xie X, Liu S, Yang C, Yang Z, Xu J, Zhai X (2017) The application of smart materials in tactile actuators for tactile information delivery . arXiv:1708.07077
Xu R, Guan Y, Huang Y (2015) Multiple human detection and tracking based on head detection for real-time video surveillance. Multimed Tools Appl 74(3):729–742
Article Google Scholar
Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1385–1392
Yoo HJ (2015) Deep convolution neural networks in computer vision. IEIE Trans Smart Process Comput 4(1):35–43
Article MathSciNet Google Scholar
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision ECCV 2014, Lecture Notes in Computer Science, vol 8689. Springer International Publishing, pp 834–849
Zhu A, Snoussi H, Cherouat A (2015) Articulated pose estimation via multiple mixture parts model. In: 2015 12th IEEE international conference on Advanced video and signal based surveillance (AVSS). IEEE, pp 1–5
Zhu A, Snoussi H, Wang T, Cherouat A (2015) Human pose estimation with multiple mixture parts model based on upper body categories. J Electron Imaging 24(4):043,021. https://doi.org/10.1117/1.JEI.24.4.043021
Article Google Scholar

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (61503017,61702150), the Aeronautical Science Foundation of China (2016ZC51022).

Author information

Authors and Affiliations

The School of Computer Science and Technology, Nanjing Tech University, Nanjing Shi, China
Aichun Zhu
School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
Tian Wang
School of Cyberspace, Hangzhou Dianzi University, Zhejiang Sheng, China
Tong Qiao

Authors

Aichun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Tian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tong Qiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Aichun Zhu or Tian Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, A., Wang, T. & Qiao, T. Multiple human upper bodies detection via candidate-region convolutional neural network. Multimed Tools Appl 78, 16077–16096 (2019). https://doi.org/10.1007/s11042-018-6964-7

Download citation

Received: 03 December 2017
Revised: 05 October 2018
Accepted: 27 November 2018
Published: 13 December 2018
Issue Date: 30 June 2019
DOI: https://doi.org/10.1007/s11042-018-6964-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple human upper bodies detection via candidate-region convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Detector-in-Detector: Multi-level Analysis for Human-Parts

Multi-Human Pose Estimation by Deep Learning-Based Sequential Approach for Human Keypoint Position and Human Body Detection

Human Pose Estimation via Deep Part Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple human upper bodies detection via candidate-region convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Detector-in-Detector: Multi-level Analysis for Human-Parts

Multi-Human Pose Estimation by Deep Learning-Based Sequential Approach for Human Keypoint Position and Human Body Detection

Human Pose Estimation via Deep Part Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation