The Use of Recurrent Neural Networks in Continuous Speech Recognition

Robinson, Tony; Hochberg, Mike; Renals, Steve

doi:10.1007/978-1-4613-1367-0_10

Tony Robinson³,
Mike Hochberg³ &
Steve Renals³

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 355))

464 Accesses
37 Citations
3 Altmetric

Abstract

This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acoustic data, the recurrent network generates an estimate of the posterior probability of of the possible phones given the observed acoustic signal. The posteriors are then converted into scaled likelihoods and used as the observation probabilities within a conventional decoding paradigm (e.g., Viterbi decoding). The advantages of using recurrent networks are that they require a small number of parameters and provide a fast decoding capability (relative to conventional, large-vocabulary, HMM systems)³.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. F. Silverman and D. P. Morgan, “The application of dynamic programming to connected speech recognition,” IEEE ASSP Magazine, vol. 7, pp. 6–25, July 1990.
Article Google Scholar
L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, pp. 257–286, February 1989.
Article Google Scholar
N. Morgan and H. Bourlard, “Continuous speech recognition using multilayer perceptrons with hidden Markov models,” in Proc. ICASSP, pp. 413–416, 1990.
Google Scholar
S. Renals, N. Morgan, H. Bourlard, M. Cohen, and H. Franco, “Connectionist probability estimators in HMM speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 2, Jan. 1994.
Google Scholar
F. Jelinek and R. Mercer, “Interpolated estimation of Markov source parameters from sparse data,” Pattern Recognition in Practice, pp. 381–397, 1980.
Google Scholar
K.-F. Lee, Automatic Speech Recognition: The Development of the SPHINX System. Boston: Kluwer Academic Publishers, 1989.
Google Scholar
S. Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, pp. 52–59, Feb. 1986.
Article Google Scholar
E. B. Baum and F. Wilczek, “Supervised learning of probability distributions by neural networks,” in Neural Information Processing Systems (D. Z. Anderson, ed.), American Institute of Physics, 1988.
Google Scholar
J. S. Bridle, “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” in Neuro-computing: Algorithms, Architectures and Applications (F. Fougelman-Soulie and J. Heŕault, eds.), pp. 227–236, Springer-Verlag, 1989.
Google Scholar
H. Bourlard and C. J. Wellekens, “Links between Markov models and multilayer perceptrons,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 1167–1178, Dec. 1990.
Article Google Scholar
H. Gish, “A probabilistic approach to the understanding and training of neural network classifiers,” in Proc. ICASSP, pp. 1361–1364, 1990.
Google Scholar
M. D. Richard and R. P. Lippmann, “Neural network classifiers estimate Bayesian a posteriori probabilities,” Neural Computation, vol. 3, pp. 461–483, 1991.
Article Google Scholar
H. Bourlard and N. Morgan, Connectionist Speech Recognition — A Hybrid Approach. Kluwer Academic Publishers, 1994.
Google Scholar
J. S. Bridle, “Alpha-Nets: A recurrent ‘neural’ network architecture with a hidden Markov model interpretation,” Speech Communication, vol. 9, pp. 83–92, Feb. 1990.
Article Google Scholar
J. S. Bridle and L. Dodd, “An Alphanet approach to optimising input transformations for continuous speech recognition,” in Proc. ICASSP, pp. 277–280, 1991.
Google Scholar
L. T. Niles and H. F. Silverman, “Combining hidden Markov models and neural network classifiers,” in Proc. ICASSP, pp. 417–420, 1990.
Google Scholar
S. J. Young, “Competitive training in hidden Markov models,” in Proc. ICASSP, pp. 681–684, 1990. Expanded in the technical report Cued/Finfeng/TR.41, Cambridge University Engineering Department.
Google Scholar
A. J. Robinson and F. Fallside, “Static and dynamic error propagation networks with application to speech coding,” in Neural Information Processing Systems (D. Z. Anderson, ed.), American Institute of Physics, 1988.
Google Scholar
P. McCullagh and J. A. Neider, Generalised Linear Models. London: Chapman and Hall, 1983.
Google Scholar
T. Robinson, “The state space and “ideal input” representations of recurrent networks,” in Visual Representations of Speech Signals, pp. 327–334, John Wiley and Sons, 1993.
Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm (with discussion),” J. Roy. Statist. Soc., vol. B39, pp. 1–38, 1977.
MathSciNet Google Scholar
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. I: Foundations. (D. E. Rumelhart and J. L. McClelland, eds.), ch. 8, Cambridge, MA: Bradford Books/MIT Press, 1986.
Google Scholar
P. J. Werbos, “Backpropagation through time: What it does and how to do it,” Proc. IEEE, vol. 78, pp. 1550–1560, Oct. 1990.
Article Google Scholar
R. A. Jacobs, “Increased rates of convergence through learning rate adaptation,” Neural Networks, vol. 1, pp. 295–307, 1988.
Article Google Scholar
W. Schiffmann, M. Joost, and R. Werner, “Optimization of the backpropagation algorithm for training multilayer perceptrons,” tech. rep., University of Koblenz, 1992.
Google Scholar
T. T. Jervis and W. J. Fitzgerald, “Optimization schemes for neural networks,” Tech. Rep. CUED/F-INFENG/TR144, Cambridge University Engineering Department, Aug. 1993.
Google Scholar
M. M. Hochberg, S. J. Renals, A. J. Robinson, and D. J. Kershaw, “Large vocabulary continuous speech recognition using a hybrid connectionist-HMM system,” in Proc. of ICSLP-94, pp. 1499–1502, 1994.
Google Scholar
M. M. Hochberg, G. D. Cook, S. J. Renals, and A. J. Robinson, “Connect ionist model combination for large vocabulary speech recognition,” in Neural Networks for Signal Processing IV (J. Vlontzos, J.-N. Hwang, and E. Wilson, eds.), pp. 269–278, IEEE, 1994.
Google Scholar
T. H. Crystal and A. S. House, “Segmental durations in connected-speech signals: Current results,” J. Acoust. Soc. Am., vol. 83, pp. 1553–1573, Apr. 1988.
Article Google Scholar
L. R. Bahl and F. Jelinek, “Apparatus and method for determining a likely word sequence from labels generated by an acoustic processor.” US Patent 4,748,670, May 1988.
Google Scholar
D. B. Paul, “An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model,” in Proc. ICASSP, vol. 1, (San Francisco), pp. 25–28, 1992.
Google Scholar
S. J. Renals and M. M. Hochberg, “Decoder technology for connectionist large vocabulary speech recognition,” Tech. Rep. Cued/Finfeng/TR.186, Cambridge University Engineering Department, 1994.
Google Scholar
S. Renals and M. Hochberg, “Efficient search using posterior phone probability estimates,” in Proc. ICASSP, pp. 596–599, 1995.
Google Scholar
P. S. Gopalakrishnan, D. Nahamoo, M. Padmanabhan, and M. A. Picheny, “A channel-bank-based phone detection strategy,” in Proc. ICASSP, vol. 2, (Adelaide), pp. 161–164, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Engineering Department, Cambridge University, Trumpington Street, Cambridge, CBS 1PZ, UK
Tony Robinson, Mike Hochberg & Steve Renals

Authors

Tony Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Mike Hochberg
View author publications
You can also search for this author in PubMed Google Scholar
Steve Renals
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AT&T Bell Laboratories, Murray Hill, NJ, 07974, USA
Chin-Hui Lee & Frank K. Soong &
School of Microelectronic Engineering, Griffith University, Australia
Kuldip K. Paliwal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Robinson, T., Hochberg, M., Renals, S. (1996). The Use of Recurrent Neural Networks in Continuous Speech Recognition. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_10

Download citation

DOI: https://doi.org/10.1007/978-1-4613-1367-0_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8590-8
Online ISBN: 978-1-4613-1367-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics