Sparse Bayesian Recurrent Neural Networks

Chatzis, Sotirios P.

doi:10.1007/978-3-319-23525-7_22

Sotirios P. Chatzis¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9285))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4168 Accesses
3 Citations

Abstract

Recurrent neural networks (RNNs) have recently gained renewed attention from the machine learning community as effective methods for modeling variable-length sequences. Language modeling, handwriting recognition, and speech recognition are only few of the application domains where RNN-based models have achieved the state-of-the-art performance currently reported in the literature. Typically, RNN architectures utilize simple linear, logistic, or softmax output layers to perform data modeling and prediction generation. In this work, for the first time in the literature, we consider using a sparse Bayesian regression or classification model as the output layer of RNNs, inspired from the automatic relevance determination (ARD) technique. The notion of ARD is to continually create new components while detecting when a component starts to overfit, where overfit manifests itself as a precision hyperparameter posterior tending to infinity. This way, our method manages to train sparse RNN models, where the number of effective (“active”) recurrently connected hidden units is selected in a data-driven fashion, as part of the model inference procedure. We develop efficient and scalable training algorithms for our model under the stochastic variational inference paradigm, and derive elegant predictive density expressions with computational costs comparable to conventional RNN formulations. We evaluate our approach considering its application to challenging tasks dealing with both regression and classification problems, and exhibit its favorable performance over the state-of-the-art.

Download to read the full chapter text

Chapter PDF

Long Distance Relationships Without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

Deep RNN Architecture: Design and Evaluation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

The CMU MoCap database. http://mocap.cs.cmu.edu/
Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)
Article MathSciNet Google Scholar
Barker, J., Vincent, E., Ma, N., Christensen, H., Green, P.: The Pascal Chime speech separation and recognition challenge. Computer Speech & Language 27(3), 621–633 (2013)
Article MATH Google Scholar
Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I.J., Bergeron, A., Bouchard, N., Bengio, Y.: Theano: new features and speed improvements. In: Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop (2012)
Google Scholar
Bayer, J., Osendorfer, C., Korhammer, D., Chen, N., Urban, S., van der Smagt, P.: On fast dropout and its applicability to recurrent networks. In: Proc. ICLR (2014)
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 52(2), 157–166 (1994)
Article Google Scholar
Boulanger-Lewandowski, N., Bengio, Y., Vincent, P.: High-dimensional sequence transduction. In: Proc. ICASSP, pp. 3178–3182 (2013)
Google Scholar
Chatzis, S., Demiris, Y.: Echo state Gaussian process. IEEE Transactions on Neural Networks 22(9), 1435–1445 (2011)
Article Google Scholar
Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better mini-batch algorithms via accelerated gradient methods. In: Proc. NIPS (2011)
Google Scholar
Fokoue, E.: Stochastic determination of the intrinsic structure in Bayesian factor analysis. Tech. Rep. TR-2004-17, Statistical and Applied Mathematical Sciences Institute (2004)
Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proc. ICASSP (2013)
Google Scholar
Hammer, B.: On the approximation capability of recurrent neural networks. Neurocomputing 31(1), 107–123 (2000)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
Article Google Scholar
Hoffman, M., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. Journal of Machine Learning Research 14(5), 1303–1347 (2013)
MathSciNet Google Scholar
Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks. Tech. Rep. 148, German National Research Center for Information Technology, Bremen (2001)
Google Scholar
Lan, G.: An optimal method for stochastic composite optimization. Mathematical Programming, 1–33 (2010)
Google Scholar
Martens, J., Sutskever, I.: Learning recurrent neural networks with hessian-free optimization. In: Proc. ICML (2011)
Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(o(1/sqr(k))\). Soviet Mathematics Doklady 27, 372–376 (1983)
Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: Proc. ICML (2013)
Google Scholar
Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
Article MATH Google Scholar
Ranganath, R., Gerrish, S., Blei, D.M.: Black box variational inference. In: Proc. AISTATS (2014)
Google Scholar
Rumelhart, D., Hinton, G., Williams, R.: Learning internal representations by error propagation. In: Parallel Dist. Proc., pp. 318–362. MIT Press (1986)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. J. Machine Learning Research 15(6), 1929–1958 (2014)
MathSciNet Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proc. ICML (2013)
Google Scholar
Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(2), 283–298 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Computer Engineering and Informatics, Cyprus University of Technology, 33 Saripolou Str., 3036, Limassol, Cyprus
Sotirios P. Chatzis

Authors

Sotirios P. Chatzis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sotirios P. Chatzis .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
University of Porto, Porto, Portugal
Pedro Pereira Rodrigues
Universidade do Porto, Porto, Portugal
Vítor Santos Costa
University of Porto - INESC TEC, Porto, Portugal
João Gama
University of Porto - INESC TEC, Porto, Portugal
Alípio Jorge
University of Porto - INESC TEC, Porto, Portugal
Carlos Soares

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chatzis, S.P. (2015). Sparse Bayesian Recurrent Neural Networks. In: Appice, A., Rodrigues, P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9285. Springer, Cham. https://doi.org/10.1007/978-3-319-23525-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-23525-7_22
Published: 29 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23524-0
Online ISBN: 978-3-319-23525-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sparse Bayesian Recurrent Neural Networks

Abstract

Chapter PDF

Similar content being viewed by others

Long Distance Relationships Without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

Deep RNN Architecture: Design and Evaluation

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Sparse Bayesian Recurrent Neural Networks

Abstract

Chapter PDF

Similar content being viewed by others

Long Distance Relationships Without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

Deep RNN Architecture: Design and Evaluation

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation