Abstract
The generalization ability of neural networks is influenced by the size of the training set. The training process for single-hidden-layer feedforward neural networks (SLFNs) consists of two stages: nonlinear feature mapping and predictor optimization in the hidden layer space. In this paper, we propose a new approach, called marginalizing out hidden layer noise (MHLN), in which the predictor of SLFNs is trained with infinite samples. First, MHLN augments the training set in the hidden layer space with constrained samples, which are generated by corrupting the hidden layer outputs of the training set with given noise. For any given training sample, when the number of corruptions is close to infinity, according to the weak law of large numbers, the explicitly generated constrained samples can be replaced with their expectations. In this way, the training set is implicitly extended in the hidden layer space by an infinite number of constrained samples. Then, MHLN constructs the predictor of SLFNs by optimizing the expected value of a quadratic loss function under the given noise distribution. The results of experiments on twenty benchmark datasets show that MHLN achieves better generalization ability.
Similar content being viewed by others
References
Li YJ, Xin X, Guo P (2015) Neural networks with marginalized corrupted hidden layer. In: Proceedings of international conference on neural information processing, pp 506–514
Burges CJC, Schölkopf B (1997) Improving the accuracy and speed of support vector machines. In: Advances in neural information processing systems, pp 375–381
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Wilamowski BM, Yu H (2010) Neural network learning without backpropagation. IEEE Trans Neural Netw 21(11):1793–1803
Hagan MT, Menhaj MB (1994) Training feedforward networks with the marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993
Branke J (1995) Evolutionary algorithms for neural network design and training. In: Proceedings of the first nordic workshop on genetic algorithms and its applications
Rosenblatt F (1962) Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Spartan Books, New York
Ding S, Xu X, Nie R (2014) Extreme learning machine and its applications. Neural Comput Appl 25(3–4):549–556
Guo P, Lyu MR (2004) A pseudoinverse learning algorithm for feedforward neural networks with stacked generalization applications to software reliability growth data. Neurocomputing 56:101–121
Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, pp 1096–1103
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th international conference on machine learning, pp 513–520
Maillet F, Eck D, Desjardins G, Lamere P (2009) Steerable playlist generation by learning song similarity from radio station playlists. In: International society for music information retrieval conference, pp 345–350
Xia B, Bao C (2014) Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun 60:13–29
Chen M, Xu Z, Weinberger K, Sha F (2012) Marginalized denoising autoencoders for domain adaptation. In: Proceedings of the 29th international conference on machine learning, pp 767–774
Maaten L, Chen M, Tyree S, Weinberger KQ (2013) Learning with marginalized corrupted features. In: Proceedings of the 30th international conference on machine learning, pp 410–418
Herbrich R, Graepel T (2004) Invariant pattern recognition by semidefinite programming machines. In: Advances in neural information processing systems, pp 33–40
Teo CH, Globerson A, Roweis ST, Smola AJ (2007) Convex learning with invariances. In: Advances in neural information processing systems, pp 1489–1496
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580
Wager S, Wang S, Liang PS (2013) Dropout training as adaptive regularization. In: Advances in neural information processing systems, pp 351–359
Wang S, Manning C (2013) Fast dropout training. In: Proceedings of the 30th international conference on machine learning, pp 118–126
Qian Q, Hu J, Jin R, Pei J, Zhu S (2014) Distance metric learning using dropout: a structured regularization approach. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 323–332
Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. In: Proceedings of the 18th international conference on machine learning, Citeseer, pp 306–313
Chen M, Zheng A, Weinberger K (2013) Fast image tagging. In: Proceedings of the 30th international conference on machine Learning, pp 1274–1282
Li Y, Yang M, Xu Z, Zhang ZM (2016) Learning with marginalized corrupted features and labels together. In: Thirtieth AAAI conference on artificial intelligence, pp 1251–1257
Huang GB, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(16):3056–3062
Huang GB, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16):3460–3468
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, Hoboken
Allen DM (1974) The relationship between variable selection and data agumentation and a method for prediction. Technometrics 16:125–127
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://archive.ics.uci.edu/ml/datasets.html
Acknowledgements
Our work is mainly supported by National Natural Science Foundation of China (No. 61375045), Beijing Natural Science Foundation (4142030).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
This paper is the extension of our work in ICONIP 2015 [1].
Rights and permissions
About this article
Cite this article
Li, Y., Guo, P. Training neural networks by marginalizing out hidden layer noise. Neural Comput & Applic 29, 401–412 (2018). https://doi.org/10.1007/s00521-017-2864-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-2864-4