Training neural networks by marginalizing out hidden layer noise

Li, Yanjun; Guo, Ping

doi:10.1007/s00521-017-2864-4

Training neural networks by marginalizing out hidden layer noise

ICONIP 2015
Published: 10 February 2017

Volume 29, pages 401–412, (2018)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yanjun Li¹ &
Ping Guo²

360 Accesses
1 Citation
Explore all metrics

Abstract

The generalization ability of neural networks is influenced by the size of the training set. The training process for single-hidden-layer feedforward neural networks (SLFNs) consists of two stages: nonlinear feature mapping and predictor optimization in the hidden layer space. In this paper, we propose a new approach, called marginalizing out hidden layer noise (MHLN), in which the predictor of SLFNs is trained with infinite samples. First, MHLN augments the training set in the hidden layer space with constrained samples, which are generated by corrupting the hidden layer outputs of the training set with given noise. For any given training sample, when the number of corruptions is close to infinity, according to the weak law of large numbers, the explicitly generated constrained samples can be replaced with their expectations. In this way, the training set is implicitly extended in the hidden layer space by an infinite number of constrained samples. Then, MHLN constructs the predictor of SLFNs by optimizing the expected value of a quadratic loss function under the given noise distribution. The results of experiments on twenty benchmark datasets show that MHLN achieves better generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Development and Application of Artificial Neural Network

Article 30 December 2017

A survey of uncertainty in deep neural networks

Article Open access 29 July 2023

References

Li YJ, Xin X, Guo P (2015) Neural networks with marginalized corrupted hidden layer. In: Proceedings of international conference on neural information processing, pp 506–514
Burges CJC, Schölkopf B (1997) Improving the accuracy and speed of support vector machines. In: Advances in neural information processing systems, pp 375–381
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article MATH Google Scholar
Wilamowski BM, Yu H (2010) Neural network learning without backpropagation. IEEE Trans Neural Netw 21(11):1793–1803
Article Google Scholar
Hagan MT, Menhaj MB (1994) Training feedforward networks with the marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993
Article Google Scholar
Branke J (1995) Evolutionary algorithms for neural network design and training. In: Proceedings of the first nordic workshop on genetic algorithms and its applications
Rosenblatt F (1962) Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Spartan Books, New York
MATH Google Scholar
Ding S, Xu X, Nie R (2014) Extreme learning machine and its applications. Neural Comput Appl 25(3–4):549–556
Article Google Scholar
Guo P, Lyu MR (2004) A pseudoinverse learning algorithm for feedforward neural networks with stacked generalization applications to software reliability growth data. Neurocomputing 56:101–121
Article Google Scholar
Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, pp 1096–1103
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th international conference on machine learning, pp 513–520
Maillet F, Eck D, Desjardins G, Lamere P (2009) Steerable playlist generation by learning song similarity from radio station playlists. In: International society for music information retrieval conference, pp 345–350
Xia B, Bao C (2014) Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun 60:13–29
Article Google Scholar
Chen M, Xu Z, Weinberger K, Sha F (2012) Marginalized denoising autoencoders for domain adaptation. In: Proceedings of the 29th international conference on machine learning, pp 767–774
Maaten L, Chen M, Tyree S, Weinberger KQ (2013) Learning with marginalized corrupted features. In: Proceedings of the 30th international conference on machine learning, pp 410–418
Herbrich R, Graepel T (2004) Invariant pattern recognition by semidefinite programming machines. In: Advances in neural information processing systems, pp 33–40
Teo CH, Globerson A, Roweis ST, Smola AJ (2007) Convex learning with invariances. In: Advances in neural information processing systems, pp 1489–1496
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580
Wager S, Wang S, Liang PS (2013) Dropout training as adaptive regularization. In: Advances in neural information processing systems, pp 351–359
Wang S, Manning C (2013) Fast dropout training. In: Proceedings of the 30th international conference on machine learning, pp 118–126
Qian Q, Hu J, Jin R, Pei J, Zhu S (2014) Distance metric learning using dropout: a structured regularization approach. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 323–332
Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. In: Proceedings of the 18th international conference on machine learning, Citeseer, pp 306–313
Chen M, Zheng A, Weinberger K (2013) Fast image tagging. In: Proceedings of the 30th international conference on machine Learning, pp 1274–1282
Li Y, Yang M, Xu Z, Zhang ZM (2016) Learning with marginalized corrupted features and labels together. In: Thirtieth AAAI conference on artificial intelligence, pp 1251–1257
Huang GB, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(16):3056–3062
Article Google Scholar
Huang GB, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16):3460–3468
Article Google Scholar
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, Hoboken
MATH Google Scholar
Allen DM (1974) The relationship between variable selection and data agumentation and a method for prediction. Technometrics 16:125–127
Article MathSciNet MATH Google Scholar
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article MATH Google Scholar
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Article Google Scholar
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://archive.ics.uci.edu/ml/datasets.html

Download references

Acknowledgements

Our work is mainly supported by National Natural Science Foundation of China (No. 61375045), Beijing Natural Science Foundation (4142030).

Author information

Authors and Affiliations

Beijing Institute of Technology, Beijing, China
Yanjun Li
Beijing Normal University, Beijing, China
Ping Guo

Authors

Yanjun Li
View author publications
You can also search for this author in PubMed Google Scholar
Ping Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ping Guo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

This paper is the extension of our work in ICONIP 2015 [1].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Guo, P. Training neural networks by marginalizing out hidden layer noise. Neural Comput & Applic 29, 401–412 (2018). https://doi.org/10.1007/s00521-017-2864-4

Download citation

Received: 03 July 2016
Accepted: 25 January 2017
Published: 10 February 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s00521-017-2864-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Training neural networks by marginalizing out hidden layer noise

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Development and Application of Artificial Neural Network

A survey of uncertainty in deep neural networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Training neural networks by marginalizing out hidden layer noise

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Development and Application of Artificial Neural Network

A survey of uncertainty in deep neural networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation