Advertisement

Increasing the Robustness of CNN-Based Human Body Segmentation in Range Images by Modeling Sensor-Specific Artifacts

  • Lama SeoudEmail author
  • Jonathan Boisvert
  • Marc-Antoine Drouin
  • Michel Picard
  • Guy Godin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11131)

Abstract

This paper addresses the problem of human body parts segmentation in range images acquired using a structured-light imaging system. We propose a solution based on a fully convolutional neural network trained on realistic synthetic data that were simulated in a way that closely emulates our structured-light imaging system with its inherent artifacts such as occlusions, noise and missing data. The results on synthetic test data demonstrate quantitatively the performance of our method in identifying 33 body parts, with negligible confusion between the front and back sides of the body and between the left and right limbs. Our experiments highlight the importance of sensor-specific data augmentation in the training set to improve the robustness of the segmentation. Most importantly, when applied to range data actually acquired by our system, the method was capable of accurately segmenting the different body parts with inter-frame consistency in real-time.

Keywords

Human body segmentation Structured-light imaging Convolutional neural network 

References

  1. 1.
    Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. In: ACM Transactions on Graphics - Proceedings of ACM SIGGRAPH 2005, pp. 408–416 (2005)Google Scholar
  2. 2.
    Chandra, S., Tsogkas, S., Kokkinos, I.: Accurate human-limb segmentation in RGB-D images for intelligent mobility assistance robots. In: IEEE International Conference on Computer Vision (ICCV), pp. 44–50 (2015)Google Scholar
  3. 3.
    Chen, L., Wei, H., Ferryman, J.: A survey of human motion analysis using depth imagery. Pattern Recogn. Lett. 34(15), 1995–2006 (2013)CrossRefGoogle Scholar
  4. 4.
    Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: International Conference on 3D Vision, pp. 479–488 (2016)Google Scholar
  5. 5.
    Dantone, M., Gall, J., Leistner, C.: Human pose estimation using body parts dependent joint regressors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3041–3048 (2013)Google Scholar
  6. 6.
    Drouin, M.A., Blais, F., Godin, G.: High resolution projector for 3D imaging. In: International Conference on 3D Vision (3DV), vol. 1, pp. 337–344 (2014)Google Scholar
  7. 7.
    Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real-time human pose tracking from range data. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 738–751. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_53CrossRefGoogle Scholar
  8. 8.
    Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., Fei-Fei, L.: Towards viewpoint invariant 3D human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 160–177. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_10CrossRefGoogle Scholar
  9. 9.
    Jain, A., Tompson, J., LeCun, Y., Bregler, C.: MoDeep: a deep learning framework using motion features for human pose estimation. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 302–315. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16808-1_21CrossRefGoogle Scholar
  10. 10.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678 (2014)Google Scholar
  11. 11.
    Jiu, M., Wolf, C., Taylor, G., Baskurt, A.: Human body part estimation from depth images via spatially-constrained deep learning. Pattern Recogn. Lett. 50, 122–129 (2014)CrossRefGoogle Scholar
  12. 12.
    Kimmel, R., Sethian, J.A.: Computing geodesic paths on manifolds. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 95, pp. 8431–8435 (1998)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)Google Scholar
  14. 14.
    Nishi, K., Miura, J.: Generation of human depth images with body part labels for complex human pose recognition. Pattern Recogn. 71, 402–413 (2017)CrossRefGoogle Scholar
  15. 15.
    Oliveira, G.L., Valada, A., Bollen, C., Burgard, W., Brox, T.: Deep learning for human part discovery in images. In: Proceedings - IEEE International Conference on Robotics and Automation, pp. 1634–1641 (2016)Google Scholar
  16. 16.
    Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images. In: Proceedings - IEEE International Conference on Robotics and Automation, pp. 3108–3113 (2010)Google Scholar
  17. 17.
    Planche, B., et al.: DepthSynth: real-time realistic synthetic data generation from CAD models for 2.5D recognition. In: International Conference on 3D Vision (3DV) (2017)Google Scholar
  18. 18.
    Robinette, K.M., Daanen, H., Paquet, E.: The CAESAR project: a 3-D surface anthropometry survey. In: Second International Conference on 3D Digital Imaging and Modeling, 3DIM, pp. 380–386 (1999)Google Scholar
  19. 19.
    Sapp, B., Taskar, B.: MODEC: multimodal decomposable models for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3674–3681 (2013)Google Scholar
  20. 20.
    Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2016)CrossRefGoogle Scholar
  21. 21.
    Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56(1), 119–135 (2013)CrossRefGoogle Scholar
  22. 22.
    Shotton, J., et al.: Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2821–2840 (2013)CrossRefGoogle Scholar
  23. 23.
    Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems (NIPS), pp. 1799–1807 (2014)Google Scholar
  24. 24.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1653–1660 (2014)Google Scholar
  25. 25.
    Varol, G., et al.: Learning from synthetic humans. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  26. 26.
    Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27(3), 1–9 (2008)CrossRefGoogle Scholar
  27. 27.
    Wei, L., Huang, Q., Ceylan, D., Vouga, E., Li, H.: Dense human body correspondences using convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1544–1553 (2016)Google Scholar

Copyright information

© Crown 2019

Authors and Affiliations

  • Lama Seoud
    • 1
    Email author
  • Jonathan Boisvert
    • 1
  • Marc-Antoine Drouin
    • 1
  • Michel Picard
    • 1
  • Guy Godin
    • 1
  1. 1.National Research CouncilOttawaCanada

Personalised recommendations