Stochastic Decorrelation Constraint Regularized Auto-Encoder for Visual Recognition

  • Fengling Mao
  • Wei XiongEmail author
  • Bo Du
  • Lefei Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10133)


Deep neural networks have achieved state-of-the-art performance on many applications such as image classification, object detection and semantic segmentation. But the difficulty of optimizing the networks still exists when training networks with a huge number of parameters. In this work, we propose a novel regularizer called stochastic decorrelation constraint (SDC) imposed on the hidden layers of the large networks, which can significantly improve the networks’ generalization capacity. SDC reduces the co-adaptions of the hidden neurons in an explicit way, with a clear objective function. In the meanwhile, we show that training the network with our regularizer has the effect of training an ensemble of exponentially many networks. We apply the proposed regularizer to the auto-encoder for visual recognition tasks. Compared to the auto-encoder without any regularizers, the SDC constrained auto-encoder can extract features with less redundancy. Comparative experiments on the MNIST database and the FERET database demonstrate the superiority of our method. When reducing the size of training data, the optimization of the network becomes much more challenging, yet our method shows even larger advantages over the conventional methods.


Deep learning Visual recognition Auto-encoder 


  1. 1.
    Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2, 1–21 (2015)CrossRefGoogle Scholar
  2. 2.
    Chen, X.W., Lin, X.: Big data deep learning: challenges and perspectives. IEEE Access 2, 514–525 (2014)CrossRefGoogle Scholar
  3. 3.
    Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRefGoogle Scholar
  4. 4.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28, 100–108 (1979)zbMATHGoogle Scholar
  5. 5.
    Dobra, A., Hans, C., Jones, B., Nevins, J.R., Yao, G., West, M.: Sparse graphical models for exploring gene expression data. J. Multivar. Anal. 90, 196–212 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Heckerman, D., Geiger, D., Chickering, D.M.: Learning bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20, 197–243 (1995)zbMATHGoogle Scholar
  7. 7.
    Boykov, Y., Veksler, O., Zabih, R.: Markov random fields with efficient approximations. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 648–655 (1998)Google Scholar
  8. 8.
    Hu, X.Y., Eleftheriou, E., Arnold, D.M.: Progressive edge-growth tanner graphs. In: Proceedings of IEEE Global Telecommunications Conference, pp. 995–1001 (2001)Google Scholar
  9. 9.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)Google Scholar
  10. 10.
    Sohl-Dickstein, J., Poole, B., Ganguli, S.: Fast large-scale optimization by unifying stochastic gradient and quasi-newton methods. In: International Conference on Machine Learning (2014)Google Scholar
  11. 11.
    Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of International Conference on Machine Learning, p. 116 (2004)Google Scholar
  12. 12.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of International Conference on Machine Learning, pp. 1096–1103 (2008)Google Scholar
  13. 13.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
  15. 15.
    Wan, L., Zeiler, M., Zhang, S., Cun, Y.L., Fergus, R.: Regularization of neural networks using dropconnect. In: Proceedings of International Conference on Machine Learning, pp. 1058–1066 (2013)Google Scholar
  16. 16.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)Google Scholar
  18. 18.
    Cheung, B., Livezey, J.A., Bansal, A.K., Olshausen, B.A.: Discovering hidden factors of variation in deep networks. arXiv preprint arXiv:1412.6583 (2014)
  19. 19.
    Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068 (2015)
  20. 20.
    LeCun, Y., Cortes, C., Burges, C.J.: The MNIST database of handwritten digits (1998)Google Scholar
  21. 21.
    Phillips, P.J.: The facial recognition technology (FERET) database (2004)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of ComputerWuhan UniversityWuhanChina
  2. 2.Collaborative Innovation Center of Geospatial TechnologyWuhanChina

Personalised recommendations