Skip to main content
Log in

Training neural networks by marginalizing out hidden layer noise

  • ICONIP 2015
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The generalization ability of neural networks is influenced by the size of the training set. The training process for single-hidden-layer feedforward neural networks (SLFNs) consists of two stages: nonlinear feature mapping and predictor optimization in the hidden layer space. In this paper, we propose a new approach, called marginalizing out hidden layer noise (MHLN), in which the predictor of SLFNs is trained with infinite samples. First, MHLN augments the training set in the hidden layer space with constrained samples, which are generated by corrupting the hidden layer outputs of the training set with given noise. For any given training sample, when the number of corruptions is close to infinity, according to the weak law of large numbers, the explicitly generated constrained samples can be replaced with their expectations. In this way, the training set is implicitly extended in the hidden layer space by an infinite number of constrained samples. Then, MHLN constructs the predictor of SLFNs by optimizing the expected value of a quadratic loss function under the given noise distribution. The results of experiments on twenty benchmark datasets show that MHLN achieves better generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Li YJ, Xin X, Guo P (2015) Neural networks with marginalized corrupted hidden layer. In: Proceedings of international conference on neural information processing, pp 506–514

  2. Burges CJC, Schölkopf B (1997) Improving the accuracy and speed of support vector machines. In: Advances in neural information processing systems, pp 375–381

  3. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  MATH  Google Scholar 

  4. Wilamowski BM, Yu H (2010) Neural network learning without backpropagation. IEEE Trans Neural Netw 21(11):1793–1803

    Article  Google Scholar 

  5. Hagan MT, Menhaj MB (1994) Training feedforward networks with the marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993

    Article  Google Scholar 

  6. Branke J (1995) Evolutionary algorithms for neural network design and training. In: Proceedings of the first nordic workshop on genetic algorithms and its applications

  7. Rosenblatt F (1962) Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Spartan Books, New York

    MATH  Google Scholar 

  8. Ding S, Xu X, Nie R (2014) Extreme learning machine and its applications. Neural Comput Appl 25(3–4):549–556

    Article  Google Scholar 

  9. Guo P, Lyu MR (2004) A pseudoinverse learning algorithm for feedforward neural networks with stacked generalization applications to software reliability growth data. Neurocomputing 56:101–121

    Article  Google Scholar 

  10. Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, pp 1096–1103

  11. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  12. Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th international conference on machine learning, pp 513–520

  13. Maillet F, Eck D, Desjardins G, Lamere P (2009) Steerable playlist generation by learning song similarity from radio station playlists. In: International society for music information retrieval conference, pp 345–350

  14. Xia B, Bao C (2014) Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun 60:13–29

    Article  Google Scholar 

  15. Chen M, Xu Z, Weinberger K, Sha F (2012) Marginalized denoising autoencoders for domain adaptation. In: Proceedings of the 29th international conference on machine learning, pp 767–774

  16. Maaten L, Chen M, Tyree S, Weinberger KQ (2013) Learning with marginalized corrupted features. In: Proceedings of the 30th international conference on machine learning, pp 410–418

  17. Herbrich R, Graepel T (2004) Invariant pattern recognition by semidefinite programming machines. In: Advances in neural information processing systems, pp 33–40

  18. Teo CH, Globerson A, Roweis ST, Smola AJ (2007) Convex learning with invariances. In: Advances in neural information processing systems, pp 1489–1496

  19. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580

  20. Wager S, Wang S, Liang PS (2013) Dropout training as adaptive regularization. In: Advances in neural information processing systems, pp 351–359

  21. Wang S, Manning C (2013) Fast dropout training. In: Proceedings of the 30th international conference on machine learning, pp 118–126

  22. Qian Q, Hu J, Jin R, Pei J, Zhu S (2014) Distance metric learning using dropout: a structured regularization approach. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 323–332

  23. Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. In: Proceedings of the 18th international conference on machine learning, Citeseer, pp 306–313

  24. Chen M, Zheng A, Weinberger K (2013) Fast image tagging. In: Proceedings of the 30th international conference on machine Learning, pp 1274–1282

  25. Li Y, Yang M, Xu Z, Zhang ZM (2016) Learning with marginalized corrupted features and labels together. In: Thirtieth AAAI conference on artificial intelligence, pp 1251–1257

  26. Huang GB, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(16):3056–3062

    Article  Google Scholar 

  27. Huang GB, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16):3460–3468

    Article  Google Scholar 

  28. Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, Hoboken

    MATH  Google Scholar 

  29. Allen DM (1974) The relationship between variable selection and data agumentation and a method for prediction. Technometrics 16:125–127

    Article  MathSciNet  MATH  Google Scholar 

  30. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  MATH  Google Scholar 

  31. Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  32. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://archive.ics.uci.edu/ml/datasets.html

Download references

Acknowledgements

Our work is mainly supported by National Natural Science Foundation of China (No. 61375045), Beijing Natural Science Foundation (4142030).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ping Guo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

This paper is the extension of our work in ICONIP 2015 [1].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Guo, P. Training neural networks by marginalizing out hidden layer noise. Neural Comput & Applic 29, 401–412 (2018). https://doi.org/10.1007/s00521-017-2864-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-2864-4

Keywords

Navigation