Abstract
Multi-person articulated pose tracking is a newly proposed computer vision task which aims at associating corresponding person articulated joints to establish pose trajectories. In this paper, we propose a region-based deep appearance model combined with an LSTM pose model to measure the similarity between different identities. A novel hierarchical association method is proposed to reduce the time consumption for deep feature extraction. We divide the association procedure into two stages and extract deep feature only when the pairs of identities are difficult to distinguish. Extensive experiments are conducted on the newly released multi-person pose tracking benchmark: PoseTrack. The results show that the tracking accuracy gains an obvious improvement when adopting multiple association cues, and the hierarchical association method could improve the tracking speed obviously.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Andriluka, M., et al.: PoseTrack: a benchmark for human pose estimation and tracking. arXiv preprint arXiv:1710.10000 (2017)
Bae, S.H., Yoon, K.J.: Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1218–1225. IEEE Press, New York (2014)
Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. Eurasip J. Image Video Process. 2008(1), 246–309 (2008)
Emami, P., Pardalos, P.M., Elefteriadou, L., Ranka, S.: Machine learning methods for solving assignment problems in multi-target tracking. arXiv preprint arXiv:1802.06897 (2018)
Girdhar, R., Gkioxari, G., Torresani, L., Paluri, M., Tran, D.: Detect-and-track: efficient pose estimation in videos. arXiv preprint arXiv:1712.09184 (2017)
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448. IEEE Press, New York (Dec 2015)
Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 91–102. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21227-7_9
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735 (1997)
Kawanishi, Y., Wu, Y., Mukunoki, M., Minoh, M.: Shinpuhkan 2014: a multi-camera pedestrian dataset for tracking people across multiple cameras. In: 20th Korea-Japan Joint Workshop on Frontiers of Computer Vision, vol. 5. Citeseer (2014)
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logistics 52(1), 7–21 (2005)
Li, W., Wang, X.: Locally aligned feature transforms across views. In: Computer Vision and Pattern Recognition, pp. 3594–3601. IEEE Press, New York (2013)
Luo, W., et al.: Multiple object tracking: a literature review. arXiv preprint arXiv:1409.7618 (2014)
Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937. IEEE Press, New York (2016)
PoseTrack: Posetrack leader board. https://posetrack.net/leaderboard.php
Ren, S., Girshick, R., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: learning to track multiple cues with long-term dependencies. arXiv preprint arXiv:1701.01909, 4(5) 6 (2017)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. IEEE Press, New York (2015)
Wang, B., et al.: Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8. IEEE Press, New York (2016)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE Press, New York (2017)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. arXiv preprint arXiv:1804.06208 (2018)
Xiu, Y., Li, J., Wang, H., Fang, Y., Lu, C.: Pose Flow: Efficient online pose tracking. arXiv preprint arXiv:1802.00977 (2018)
Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 1290–1299. IEEE Press, New York, October 2017
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: A benchmark. In: IEEE International Conference on Computer Vision, pp. 1116–1124. IEEE Press, New York (2015)
Acknowledgments
This work is supported by National High-Tech R&D Program (863 Program) under Grant 2015AA016402.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, C., Zhou, Y. (2018). Hierarchical Online Multi-person Pose Tracking with Multiple Cues. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11306. Springer, Cham. https://doi.org/10.1007/978-3-030-04224-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-04224-0_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04223-3
Online ISBN: 978-3-030-04224-0
eBook Packages: Computer ScienceComputer Science (R0)