An Improved Two-Stage Multi-person Pose Estimation Model

Wang, Sutong; Wang, Yanzhang; Wang, Xuehua; Ye, Xin; Li, Huaiming; Chen, Xuelong

doi:10.1007/978-981-15-1209-4_2

Sutong Wang ORCID: orcid.org/0000-0001-6603-6047¹⁰,
Yanzhang Wang¹⁰,
Xuehua Wang¹⁰,
Xin Ye¹⁰,
Huaiming Li¹⁰ &
…
Xuelong Chen¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1103))

Included in the following conference series:

International Symposium on Knowledge and Systems Sciences

614 Accesses

Abstract

Generally, multi-person pose estimation plays a crucial role in behavior recognition in images and videos. Previously, pose estimation of a single person is popular and achieves high prediction accuracy with the development of deep learning. However, pose estimation of multi-person remains to be a huge challenge and cannot achieve the same effect as that of a single person. It mainly results from the rare, missing or incorrect location detection and overlap of pose, which are usually caused by incomplete person identification. Therefore, we propose an improved two-stage multi-person pose estimation model (ITMPE) to further improve the performance of multi-person pose estimation. The first stage, Mask R-CNN is used for person identification. The second stage, processed images or videos with identified people only are fed into OpenPose model for multi-person pose estimation. The comparative experiments show that our proposed model achieves a significant improvement than original model. Our proposed model reduces the MSE, MAE by around 27.38%, 21.57% and increases R², Mean values by 49.80% and 96.91% on average, respectively. The improvement in person identification and misclassification are shown in our comparison images. More people are captured and given the pose estimation, which directly affect the performance of behavior recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (2017)
Google Scholar
Zhu, X., Jiang, Y., Luo, Z.: Multi-person pose estimation for PoseTrack with enhanced part affinity fields. In: ICCVW (2017)
Google Scholar
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of IEEE International Conference on Computational Vision (2017)
Google Scholar
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. TPAMI 35, 2878–2890 (2013)
Article Google Scholar
Dantone, M., Gall, J., Leistner, C., Van Gool, L.: Human pose estimation using body parts dependent joint regressors. In: CVPR (2013)
Google Scholar
Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: CVPR (2017)
Google Scholar
Tang, W., Yu, P., Wu, Y.: Deeply learned compositional models for human pose estimation. In: ECCV (2018)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision (2015). https://doi.org/10.1109/iccv.2015.169
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: VGG-16. arXiv Preprint (2014)
Google Scholar
Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)
Google Scholar
Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature Mining for Localised Crowd Counting (2012)
Google Scholar
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2013)
Google Scholar
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Panteleris, P., Oikonomidis, I., Argyros, A.: Using a single RGB frame for real time 3D hand pose estimation in the wild. In: Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018 (2018)
Google Scholar
Li, M., Zhou, Z., Liu, X.: Multi-person pose estimation using bounding box constraint and LSTM. IEEE Trans. Multimedia (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Institution of Information and Decision Technology, Dalian University of Technology, Dalian, 116023, China
Sutong Wang, Yanzhang Wang, Xuehua Wang, Xin Ye, Huaiming Li & Xuelong Chen

Authors

Sutong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanzhang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuehua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Ye
View author publications
You can also search for this author in PubMed Google Scholar
Huaiming Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuelong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sutong Wang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jian Chen
Japan Advanced Institute of Science and Technology, Nomi, Japan
Van Nam Huynh
Duy Tan University, Da Nang, Vietnam
Gia-Nhu Nguyen
CAS Academy of Mathematics and Systems Sciences, Beijing, China
Xijin Tang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Wang, Y., Wang, X., Ye, X., Li, H., Chen, X. (2019). An Improved Two-Stage Multi-person Pose Estimation Model. In: Chen, J., Huynh, V., Nguyen, GN., Tang, X. (eds) Knowledge and Systems Sciences. KSS 2019. Communications in Computer and Information Science, vol 1103. Springer, Singapore. https://doi.org/10.1007/978-981-15-1209-4_2

Download citation

DOI: https://doi.org/10.1007/978-981-15-1209-4_2
Published: 01 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1208-7
Online ISBN: 978-981-15-1209-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics