Feature Visualization Based Stacked Convolutional Neural Network for Human Body Detection in a Depth Image

  • Xiao Liu
  • Ling Mei
  • Dakun Yang
  • Jianhuang Lai
  • Xiaohua XieEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11257)


Human body detection is a key technology in the fields of biometric recognition, and the detection in a depth image is rather challenging due to serious noise effects and lack of texture information. For addressing this issue, we propose the feature visualization based stacked convolutional neural network (FV-SCNN), which can be trained by a two-layer unsupervised learning. Specifically, the next CNN layer is obtained by optimizing a sparse auto-encoder (SAE) on the reconstructed visualization of the former to capture robust high-level features. Experiments on SZU Depth Pedestrian dataset verify that the proposed method can achieve favorable accuracy for body detection. The key of our method is that the CNN-based feature visualization actually pursues a data-driven processing for a depth map, and significantly alleviates the influences of noise and corruptions on body detection.


Human detection Depth image Feature visualization Sparse auto-encoder Convolutional neural network 



This project is supported by the Natural Science Foundation of China (61573387, 61672544), Guangzhou Project (201807010070), and Fundamental Research Funds for the Central Universities (No. 161gpy41).


  1. 1.
    Mei, L., Yang, D., Feng, Z., Lai, J.: WLD-TOP based algorithm against face spoofing attacks. Biometric Recognition. LNCS, vol. 9428, pp. 135–142. Springer, Cham (2015). Scholar
  2. 2.
    Lee, G.-H., Kim, D.-S., Kyung, C.-M.: Advanced human detection using fused information of depth and intensity images. In: Kyung, C.-M. (ed.) Theory and Applications of Smart Cameras. KRS, pp. 265–279. Springer, Dordrecht (2016). Scholar
  3. 3.
    Su, S., Liu, Z., Xu, S., Li, S., Ji, R.: Sparse auto-encoder based feature learning for human body detection in depth image. Signal Process. 112, 43–52 (2015)CrossRefGoogle Scholar
  4. 4.
    Wu, S., Yu, S., Chen, W.: An attempt to pedestrian detection in depth images. In: 3rd Chinese Conference on Intelligent Visual Surveillance, pp. 97–100. IEEE Press, Beijing (2011)Google Scholar
  5. 5.
    Spinello, L., Arras, K.-O.: People detection in RGB-D data. In: 2011 International Conference on Intelligent Robots and Systems, pp. 3838–3843. IEEE Press, San Francisco (2011)Google Scholar
  6. 6.
    Yu, S., Wu, S., Wang, L.: SLTP: a fast descriptor for people detection in depth images. In: 9th IEEE International Conference on Advanced Video and Signal-Based Surveillance, pp. 43–47. IEEE Press, Beijing (2012)Google Scholar
  7. 7.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., et al.: Going deeper with convolutions. In: 2015 CVPR, pp. 1–9. IEEE Press, Boston (2015)Google Scholar
  8. 8.
    Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: 2015 CVPR, pp. 5188–5196. IEEE Press, Boston (2015)Google Scholar
  9. 9.
    Mei, L., Chen, Z.-Y., Lai, J.-H.: Geodesic-based probability propagation for efficient optical flow. Electron. Lett. 54(12), 758–760 (2018). Print ISSN: 0013-5194. Online ISSN: 1350-911XCrossRefGoogle Scholar
  10. 10.
    Yang, D., Lai, J., Mei, L.: Deep representations based on sparse auto-encoder networks for face spoofing detection. In: You, Z., et al. (eds.) CCBR 2016. LNCS, vol. 9967, pp. 620–627. Springer, Cham (2016). Scholar
  11. 11.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Uijlings, J., Van De Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)CrossRefGoogle Scholar
  13. 13.
    Hinton, G.-E., Salakhutdinov, R.-R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Li, Y.-R., Yu, S., Wu, S.: Pedestrian detection in depth images using framelet regularization. In: 2012 IEEE International Conference on Computer Science and Automation Engineering, CSAE, pp. 300–303. IEEE Press (2012)Google Scholar
  15. 15.
    Weinzaepfel, P., Jégou, H., Pérez, P.: Reconstructing an image from its local descriptors. In: 2011 CVPR, pp. 337–344. IEEE Press, Colorado Springs (2011)Google Scholar
  16. 16.
    Ikemura, S., Fujiyoshi, H.: Real-time human detection using relational depth similarity features. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6495, pp. 25–38. Springer, Heidelberg (2011). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Sun Yat-sen UniversityGuangzhouChina
  2. 2.Guangdong Key Laboratory of Information Security TechnologyGuangzhouChina
  3. 3.Key Laboratory of Machine Intelligence and Advanced ComputingMinistry of EducationGuangzhouChina

Personalised recommendations