Abstract
Recurrent neural networks (RNNs) have recently gained renewed attention from the machine learning community as effective methods for modeling variable-length sequences. Language modeling, handwriting recognition, and speech recognition are only few of the application domains where RNN-based models have achieved the state-of-the-art performance currently reported in the literature. Typically, RNN architectures utilize simple linear, logistic, or softmax output layers to perform data modeling and prediction generation. In this work, for the first time in the literature, we consider using a sparse Bayesian regression or classification model as the output layer of RNNs, inspired from the automatic relevance determination (ARD) technique. The notion of ARD is to continually create new components while detecting when a component starts to overfit, where overfit manifests itself as a precision hyperparameter posterior tending to infinity. This way, our method manages to train sparse RNN models, where the number of effective (“active”) recurrently connected hidden units is selected in a data-driven fashion, as part of the model inference procedure. We develop efficient and scalable training algorithms for our model under the stochastic variational inference paradigm, and derive elegant predictive density expressions with computational costs comparable to conventional RNN formulations. We evaluate our approach considering its application to challenging tasks dealing with both regression and classification problems, and exhibit its favorable performance over the state-of-the-art.
Chapter PDF
Similar content being viewed by others
Keywords
- Recurrent Neural Network
- Hide Unit
- Novelty Detection
- Stochastic Gradient Descent
- Automatic Relevance Determination
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
The CMU MoCap database. http://mocap.cs.cmu.edu/
Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)
Barker, J., Vincent, E., Ma, N., Christensen, H., Green, P.: The Pascal Chime speech separation and recognition challenge. Computer Speech & Language 27(3), 621–633 (2013)
Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I.J., Bergeron, A., Bouchard, N., Bengio, Y.: Theano: new features and speed improvements. In: Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop (2012)
Bayer, J., Osendorfer, C., Korhammer, D., Chen, N., Urban, S., van der Smagt, P.: On fast dropout and its applicability to recurrent networks. In: Proc. ICLR (2014)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 52(2), 157–166 (1994)
Boulanger-Lewandowski, N., Bengio, Y., Vincent, P.: High-dimensional sequence transduction. In: Proc. ICASSP, pp. 3178–3182 (2013)
Chatzis, S., Demiris, Y.: Echo state Gaussian process. IEEE Transactions on Neural Networks 22(9), 1435–1445 (2011)
Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better mini-batch algorithms via accelerated gradient methods. In: Proc. NIPS (2011)
Fokoue, E.: Stochastic determination of the intrinsic structure in Bayesian factor analysis. Tech. Rep. TR-2004-17, Statistical and Applied Mathematical Sciences Institute (2004)
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proc. ICASSP (2013)
Hammer, B.: On the approximation capability of recurrent neural networks. Neurocomputing 31(1), 107–123 (2000)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
Hoffman, M., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. Journal of Machine Learning Research 14(5), 1303–1347 (2013)
Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks. Tech. Rep. 148, German National Research Center for Information Technology, Bremen (2001)
Lan, G.: An optimal method for stochastic composite optimization. Mathematical Programming, 1–33 (2010)
Martens, J., Sutskever, I.: Learning recurrent neural networks with hessian-free optimization. In: Proc. ICML (2011)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(o(1/sqr(k))\). Soviet Mathematics Doklady 27, 372–376 (1983)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: Proc. ICML (2013)
Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
Ranganath, R., Gerrish, S., Blei, D.M.: Black box variational inference. In: Proc. AISTATS (2014)
Rumelhart, D., Hinton, G., Williams, R.: Learning internal representations by error propagation. In: Parallel Dist. Proc., pp. 318–362. MIT Press (1986)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. J. Machine Learning Research 15(6), 1929–1958 (2014)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proc. ICML (2013)
Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(2), 283–298 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chatzis, S.P. (2015). Sparse Bayesian Recurrent Neural Networks. In: Appice, A., Rodrigues, P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9285. Springer, Cham. https://doi.org/10.1007/978-3-319-23525-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-23525-7_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23524-0
Online ISBN: 978-3-319-23525-7
eBook Packages: Computer ScienceComputer Science (R0)