Abstract
This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acoustic data, the recurrent network generates an estimate of the posterior probability of of the possible phones given the observed acoustic signal. The posteriors are then converted into scaled likelihoods and used as the observation probabilities within a conventional decoding paradigm (e.g., Viterbi decoding). The advantages of using recurrent networks are that they require a small number of parameters and provide a fast decoding capability (relative to conventional, large-vocabulary, HMM systems)3.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H. F. Silverman and D. P. Morgan, “The application of dynamic programming to connected speech recognition,” IEEE ASSP Magazine, vol. 7, pp. 6–25, July 1990.
L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, pp. 257–286, February 1989.
N. Morgan and H. Bourlard, “Continuous speech recognition using multilayer perceptrons with hidden Markov models,” in Proc. ICASSP, pp. 413–416, 1990.
S. Renals, N. Morgan, H. Bourlard, M. Cohen, and H. Franco, “Connectionist probability estimators in HMM speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 2, Jan. 1994.
F. Jelinek and R. Mercer, “Interpolated estimation of Markov source parameters from sparse data,” Pattern Recognition in Practice, pp. 381–397, 1980.
K.-F. Lee, Automatic Speech Recognition: The Development of the SPHINX System. Boston: Kluwer Academic Publishers, 1989.
S. Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, pp. 52–59, Feb. 1986.
E. B. Baum and F. Wilczek, “Supervised learning of probability distributions by neural networks,” in Neural Information Processing Systems (D. Z. Anderson, ed.), American Institute of Physics, 1988.
J. S. Bridle, “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” in Neuro-computing: Algorithms, Architectures and Applications (F. Fougelman-Soulie and J. Heŕault, eds.), pp. 227–236, Springer-Verlag, 1989.
H. Bourlard and C. J. Wellekens, “Links between Markov models and multilayer perceptrons,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 1167–1178, Dec. 1990.
H. Gish, “A probabilistic approach to the understanding and training of neural network classifiers,” in Proc. ICASSP, pp. 1361–1364, 1990.
M. D. Richard and R. P. Lippmann, “Neural network classifiers estimate Bayesian a posteriori probabilities,” Neural Computation, vol. 3, pp. 461–483, 1991.
H. Bourlard and N. Morgan, Connectionist Speech Recognition — A Hybrid Approach. Kluwer Academic Publishers, 1994.
J. S. Bridle, “Alpha-Nets: A recurrent ‘neural’ network architecture with a hidden Markov model interpretation,” Speech Communication, vol. 9, pp. 83–92, Feb. 1990.
J. S. Bridle and L. Dodd, “An Alphanet approach to optimising input transformations for continuous speech recognition,” in Proc. ICASSP, pp. 277–280, 1991.
L. T. Niles and H. F. Silverman, “Combining hidden Markov models and neural network classifiers,” in Proc. ICASSP, pp. 417–420, 1990.
S. J. Young, “Competitive training in hidden Markov models,” in Proc. ICASSP, pp. 681–684, 1990. Expanded in the technical report Cued/Finfeng/TR.41, Cambridge University Engineering Department.
A. J. Robinson and F. Fallside, “Static and dynamic error propagation networks with application to speech coding,” in Neural Information Processing Systems (D. Z. Anderson, ed.), American Institute of Physics, 1988.
P. McCullagh and J. A. Neider, Generalised Linear Models. London: Chapman and Hall, 1983.
T. Robinson, “The state space and “ideal input” representations of recurrent networks,” in Visual Representations of Speech Signals, pp. 327–334, John Wiley and Sons, 1993.
A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm (with discussion),” J. Roy. Statist. Soc., vol. B39, pp. 1–38, 1977.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. I: Foundations. (D. E. Rumelhart and J. L. McClelland, eds.), ch. 8, Cambridge, MA: Bradford Books/MIT Press, 1986.
P. J. Werbos, “Backpropagation through time: What it does and how to do it,” Proc. IEEE, vol. 78, pp. 1550–1560, Oct. 1990.
R. A. Jacobs, “Increased rates of convergence through learning rate adaptation,” Neural Networks, vol. 1, pp. 295–307, 1988.
W. Schiffmann, M. Joost, and R. Werner, “Optimization of the backpropagation algorithm for training multilayer perceptrons,” tech. rep., University of Koblenz, 1992.
T. T. Jervis and W. J. Fitzgerald, “Optimization schemes for neural networks,” Tech. Rep. CUED/F-INFENG/TR144, Cambridge University Engineering Department, Aug. 1993.
M. M. Hochberg, S. J. Renals, A. J. Robinson, and D. J. Kershaw, “Large vocabulary continuous speech recognition using a hybrid connectionist-HMM system,” in Proc. of ICSLP-94, pp. 1499–1502, 1994.
M. M. Hochberg, G. D. Cook, S. J. Renals, and A. J. Robinson, “Connect ionist model combination for large vocabulary speech recognition,” in Neural Networks for Signal Processing IV (J. Vlontzos, J.-N. Hwang, and E. Wilson, eds.), pp. 269–278, IEEE, 1994.
T. H. Crystal and A. S. House, “Segmental durations in connected-speech signals: Current results,” J. Acoust. Soc. Am., vol. 83, pp. 1553–1573, Apr. 1988.
L. R. Bahl and F. Jelinek, “Apparatus and method for determining a likely word sequence from labels generated by an acoustic processor.” US Patent 4,748,670, May 1988.
D. B. Paul, “An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model,” in Proc. ICASSP, vol. 1, (San Francisco), pp. 25–28, 1992.
S. J. Renals and M. M. Hochberg, “Decoder technology for connectionist large vocabulary speech recognition,” Tech. Rep. Cued/Finfeng/TR.186, Cambridge University Engineering Department, 1994.
S. Renals and M. Hochberg, “Efficient search using posterior phone probability estimates,” in Proc. ICASSP, pp. 596–599, 1995.
P. S. Gopalakrishnan, D. Nahamoo, M. Padmanabhan, and M. A. Picheny, “A channel-bank-based phone detection strategy,” in Proc. ICASSP, vol. 2, (Adelaide), pp. 161–164, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Kluwer Academic Publishers
About this chapter
Cite this chapter
Robinson, T., Hochberg, M., Renals, S. (1996). The Use of Recurrent Neural Networks in Continuous Speech Recognition. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_10
Download citation
DOI: https://doi.org/10.1007/978-1-4613-1367-0_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8590-8
Online ISBN: 978-1-4613-1367-0
eBook Packages: Springer Book Archive