Multimedia Tools and Applications

, Volume 78, Issue 2, pp 2571–2586 | Cite as

Gradual recovery based occluded digit images recognition

  • Yasi Wang
  • Hongxun YaoEmail author
  • Wei Yu
  • Dong Wang
  • Shangchen Zhou
  • Xiaoshuai Sun


Recent research shows that auto-encoder is suitable to model a variation which varies smoothly. In this paper, we attempt to utilize auto-encoder to recognize partially occluded digit images with gradual recovery. We propose a new variation of auto-encoder, namely the “generalized auto-encoder”, and construct stacked generalized auto-encoders (SGAE) for the problem of occluded digit images recovery and recognition. Rather than recovering the occlusion directly, the degree of occlusion is regarded as a continuous variable, and the recovery task is regarded as a gradual process. We divide the whole task into multiple intermediate recovery procedures, and assign each procedure to one generalized auto-encoder, thus handling the recovery problem gradually. Based on the encouraging recovery results, the occluded digit images can be recognized well. The results demonstrate that gradual recovery outperforms direct recovery of the occluded region. Moreover, the main application in this paper is occluded digit images recognition, though, the proposed framework can be generalized to other problems easily and nicely. Extensive experiments are designed to verify our settings and show the effectiveness, extendibility and generalizability of the method.


Stacked generalized auto-encoders Gradual occlusion recovery Occluded digit images recognition Convolutional neural network 



  1. 1.
    Bansal A, Chen X, Russell B, Gupta A, Ramanan D (2017) Pixelnet: representation of the pixels, by the pixels, and for the pixels. arXiv:1702.06506
  2. 2.
    Benenson R (2014) Occlusion handling. Springer, New YorkCrossRefGoogle Scholar
  3. 3.
    Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE conference on computer vision and pattern recognition, pp 3642–3649Google Scholar
  4. 4.
    Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43:996–1002CrossRefGoogle Scholar
  5. 5.
    de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. In: International conference on computer vision theory and applicationsGoogle Scholar
  6. 6.
    Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38:295–307CrossRefGoogle Scholar
  7. 7.
    Ekman P, Friesen W (1978) Facial action coding system. Consulting Psychologists Press, WashingtonGoogle Scholar
  8. 8.
    Fan N (2010) Feature-based partially occluded object recognition. In: International conference on pattern recognition, pp 3001–3004Google Scholar
  9. 9.
    Filho ANGL, Mello CAB (2012) A novel method for reconstructing degraded digits. In: IEEE international conference on systems, man, and cybernetics, pp 733–738Google Scholar
  10. 10.
    Ghifary M, Kleijn W, Zhang M (2014) Deep hybrid networks with good out-of-sample object recognition. In: IEEE international conference on acoustics, speech and signal processing, pp 5437–5441Google Scholar
  11. 11.
    Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507MathSciNetCrossRefGoogle Scholar
  12. 12.
    Hu Z, Song Y (2009) Dimensionality reduction and reconstruction of data based on autoencoder network. J Electron Inf Technol 31:1189–1192Google Scholar
  13. 13.
    Isola P, Zhu JY, Zhou T, Efros AA (2016) Image-to-image translation with conditional adversarial networks. arXiv:1611.07004
  14. 14.
    Kan M, Shan S, Chang H, Chen X (2014) Stacked progressive auto-encoders (spae) for face recognition across poses. In: IEEE conference on computer vision and pattern recognition, pp 1883–1890Google Scholar
  15. 15.
    Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In: IEEE international conference on automatic face and gesture recognition, pp 46–53Google Scholar
  16. 16.
    Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114
  17. 17.
    Kramer MA (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal 37:233–243CrossRefGoogle Scholar
  18. 18.
    Krolupper F, Flusser J (2007) Polygonal shape description for recognition of partially occluded objects. Pattern Recogn Lett 28:1002–1011CrossRefGoogle Scholar
  19. 19.
    Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324CrossRefGoogle Scholar
  20. 20.
    Lee CY, Xie S, Gallagher P, Zhang Z, Tu Z (2014) Deeply-supervised nets. Eprint Arxiv, pp 562– 570Google Scholar
  21. 21.
    Li C, Zhu J, Zhang B (2016) Learning to generate with memory. International Conference on Machine Learning 48:1177–1186Google Scholar
  22. 22.
    Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. International Joint Conference on Artificial Intelligence 2015:1617–1623Google Scholar
  23. 23.
    Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI conference on artificial intelligenceGoogle Scholar
  24. 24.
    Liu L, Xiong C, Zhang H, Niu Z (2016) Deep aging face verification with large gaps. IEEE Trans Multimedia 18:64–75CrossRefGoogle Scholar
  25. 25.
    Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115CrossRefGoogle Scholar
  26. 26.
    Liu Y, Zhang L, Nie L, Yan Y, Rosenblum DS (2016) Fortune teller: predicting your career path. AAAI Conference on Artificial Intelligence 2016:201–207Google Scholar
  27. 27.
    Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning. In: International joint conference on artificial intelligence, pp 2576–2582Google Scholar
  28. 28.
    Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimedia Tools and Applications 76:10,701–10,719CrossRefGoogle Scholar
  29. 29.
    Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: IEEE conference on computer vision and pattern recognition - workshops, pp 94–101Google Scholar
  30. 30.
    Makhzani A, Frey B (2013) K-sparse autoencoders. arXiv:1312.5663
  31. 31.
    Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: International conference on machine learning, pp 833–840Google Scholar
  32. 32.
    Saber E, Xu Y, Tekalp AM (2005) Partial shape recognition by sub-matrix matching for partial matching guided image labeling. Pattern Recogn 38:1560–1573CrossRefGoogle Scholar
  33. 33.
    Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE conference on computer vision and pattern recognition, pp 3642–3649Google Scholar
  34. 34.
    Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: International conference on machine learning, pp 1096–1103Google Scholar
  35. 35.
    Wang P, Yuille AL (2015) Doc: deep occlusion recovering from a single image. CoRR. arXiv:1511.06457
  36. 36.
    Wang S, Shao M, Fu Y (2014) Attractive or not?: beauty prediction with attractiveness-aware encoders and robust late fusion. In: ACM international conference on multimedia, pp 805–808Google Scholar
  37. 37.
    Wang Y, Yao H, Zhao S (2015) Auto-encoder based dimensionality reduction. Neurocomputing 184 :232–242CrossRefGoogle Scholar
  38. 38.
    Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: IEEE conference on computer vision and pattern recognition, pp 2411–2418Google Scholar
  39. 39.
    Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: IEEE conference on computer vision and pattern recognition, pp 532–539Google Scholar
  40. 40.
    Yang H, Wang B, Lin S, Wipf D, Guo M, Guo B (2015) Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: IEEE international conference on computer vision, pp 4633–4641Google Scholar
  41. 41.
    Zhao F, Feng J, Zhao J, Yang W, Yan S (2016) Robust lstm-autoencoders for face de-occlusion in the wild. arXiv:1612.08534

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Yasi Wang
    • 1
  • Hongxun Yao
    • 1
    Email author
  • Wei Yu
    • 1
  • Dong Wang
    • 2
  • Shangchen Zhou
    • 1
  • Xiaoshuai Sun
    • 1
  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina
  2. 2.State Key Laboratory of Robotics and SystemHarbin Institute of TechnologyHarbinChina

Personalised recommendations