Multi-Domain Pose Network for Multi-Person Pose Estimation and Tracking

  • Hengkai GuoEmail author
  • Tang Tang
  • Guozhong Luo
  • Riwei Chen
  • Yongchen Lu
  • Linfu Wen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)


Multi-person human pose estimation and tracking in the wild is important and challenging. For training a powerful model, large-scale training data are crucial. While there are several datasets for human pose estimation, the best practice for training on multi-dataset has not been investigated. In this paper, we present a simple network called Multi-Domain Pose Network (MDPN) to address this problem. By treating the task as multi-domain learning, our methods can learn a better representation for pose prediction. Together with prediction heads fine-tuning and multi-branch combination, it shows significant improvement over baselines and achieves the best performance on PoseTrack ECCV 2018 Challenge without additional datasets other than MPII and COCO.


Human pose estimation Multi-domain learning 


  1. 1.
    Andriluka, M., et al.: Posetrack: a benchmark for human pose estimation and tracking (2017)Google Scholar
  2. 2.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Computer Vision and Pattern Recognition, pp. 3686–3693 (2014)Google Scholar
  3. 3.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1302–1310 (2017)Google Scholar
  4. 4.
    Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation (2017)Google Scholar
  5. 5.
    Dai, J., et al.: Deformable convolutional networks. In: IEEE International Conference on Computer Vision, pp. 764–773 (2017)Google Scholar
  6. 6.
    Fang, H., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: The IEEE International Conference on Computer Vision (ICCV), vol. 2 (2017)Google Scholar
  7. 7.
    Girdhar, R., Gkioxari, G., Torresani, L., Paluri, M., Du, T.: Detect-and-track: efficient pose estimation in videos (2018)Google Scholar
  8. 8.
    Guo, H., Wang, G., Chen, X., Zhang, C., Qiao, F., Yang, H.: Region ensemble network: improving convolutional network for hand pose estimation. In: IEEE International Conference on Image Processing, pp. 4512–4516 (2017)Google Scholar
  9. 9.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  11. 11.
    Jin, S., et al.: Towards multi-person pose tracking: bottom-up and top-down methods (2017)Google Scholar
  12. 12.
    Li, H., Li, Y., Porikli, F.: Convolutional neural net bagging for online visual tracking. Comput. Vis. Image Understand. 153, 120–129 (2016)CrossRefGoogle Scholar
  13. 13.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  14. 14.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)Google Scholar
  15. 15.
    Papandreou, G., et al.: Towards accurate multi-person pose estimation in the wild, pp. 3711–3719 (2017)Google Scholar
  16. 16.
    Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018)Google Scholar
  17. 17.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2015)CrossRefGoogle Scholar
  18. 18.
    Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018). Scholar
  19. 19.
    Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1249–1258 (2016)Google Scholar
  20. 20.
    Xiu, Y., Li, J., Wang, H., Fang, Y., Lu, C.: Pose flow: Efficient online pose tracking. arXiv preprint arXiv:1802.00977 (2018)
  21. 21.
    Zhu, X., Jiang, Y., Luo, Z.: Multi-person pose estimation for posetrack with enhanced part affinity fields. In: ICCV PoseTrack Workshop (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Hengkai Guo
    • 1
    Email author
  • Tang Tang
    • 1
  • Guozhong Luo
    • 1
  • Riwei Chen
    • 1
  • Yongchen Lu
    • 1
  • Linfu Wen
    • 1
  1. 1.ByteDance AI LabBeijingChina

Personalised recommendations