Skip to main content

Predictive Connectionist Approach to Speech Recognition

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3445))

Abstract

This tutorial describes a context-dependent Hidden Control Neural Network (HCNN) architecture for large vocabulary continuous speech recognition. Its basic building element, the context-dependent HCNN model, is connectionist network trained to capture dynamics of sub-word units of speech. The described HCNN model belongs to a family of Hidden Markov Model/Multi-Layer Perceptron (HMM/MLP) hybrids, usually referred to as Predictive Neural Networks [1]. The model is trained to generate continuous real-valued output vector predictions as opposed to estimate maximum a posteriori probabilities (MAP) when performing pattern classification. Explicit context-dependent modeling is introduced to refine the baseline HCNN model for continuous speech recognition. The extended HCNN system was initially evaluated on the Conference Registration Database of CMU. On the same task, the HCNN modeling yielded better generalization performance than the Linked Predictive Neural Networks (LPNN). Additionally, several optimizations were possible when implementing the HCNN system. The tutorial concludes with the discussion of future research in the area of predictive connectionist approach to speech recognition.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: a Hybrid Approach. Kluwer Academic Publishers, Dordrecht (1994)

    Google Scholar 

  2. Lapedes, A., Farber, R.: Nonlinear Signal Processing Using Neural Networks: Prediction and System Modelling. Technical Report LA-UR-87-2662, Los Alamos National Laboratory (1987)

    Google Scholar 

  3. Iso, K.: Speech Recognition Using Neural Prediction Model. IEICE Technical Report SP89-23, 81-87 (1989)

    Google Scholar 

  4. Iso, K., Watanabe, T.: Speaker-Independent Word Recognition Using a Neural Prediction Model. In: Proc. IEEE Int. Conf. on ASSP, pp. 441–444 (1990)

    Google Scholar 

  5. Iso, K., Watanabe, T.: Speech Recognition Using Demi-Syllable Neural Prediction Model. Advances in Neural Information Processing Systems 3, 227–233 (1991)

    Google Scholar 

  6. Iso, K., Watanabe, T.: Large Vocabulary Speech Recognition Using Neural Prediction Model. In: Proc. IEEE Int. Conf. on ASSP, pp. 57–60 (1991)

    Google Scholar 

  7. Levin, E.: Word Recognition Using Hidden Control Neural Architecture. In: Proc. Speech- Tech 1990, pp. 20-25 (1990)

    Google Scholar 

  8. Levin, E.: Word Recognition Using Hidden Control Neural Architecture. Proc. IEEE Int. Conf. on ASSP, pp. 433-436 (1990)

    Google Scholar 

  9. Levin, E.: Modeling Time Varying Systems Using a Hidden Control Neural Network Architecture. Advances in Neural Information Processing Systems 3, 147–154 (1991)

    Google Scholar 

  10. Tebelskis, J., Waibel, A.: Large Vocabulary Recognition Using Linked Predictive Neural Networks. In: Proc. IEEE Int. Conf. on ASSP, pp. 437–440 (1990)

    Google Scholar 

  11. Tebelskis, J., Waibel, A., Petek, B., Schmidbauer, O.: Continuous Speech Recognition by Linked Predictive Neural Networks. Advances in Neural Information Processing Systems 3, 199–205 (1991)

    Google Scholar 

  12. Tebelskis, J., Waibel, A., Petek, B., Schmidbauer, O.: Continuous Speech Recognition Using Linked Predictive Neural Networks. In: Proc. IEEE Int. Conf. on ASSP, pp. 61–64 (1991)

    Google Scholar 

  13. Tebelskis, J.: Speech Recognition using Neural Networks. PhD thesis, School of Computer Science, Pittsburgh, PA (1995)

    Google Scholar 

  14. Tishby, N.: A Dynamical Systems Approach to Speech Processing. In: Proc. IEEE Int. Conf. on ASSP, pp. 365-368 (1990)

    Google Scholar 

  15. Cybenko, G.: Approximation by Superpositions of a Sigmoidal Function. Technical report CSRD 856, University of Illinois (1989)

    Google Scholar 

  16. Funahashi, K.: On the Approximate Realization of Continuous Mappings by Neural Networks. Neural Networks 2, 183–192 (1989)

    Article  Google Scholar 

  17. Hornik, K., Stinchcombe, M., White, H.: Multi-Layer Feedforward Networks are Universal Approximators. Technical Report USCD (1989)

    Google Scholar 

  18. Hornik, K.: Approximation Capabilities of Multilayer Feedforward Networks. Neural Networks 4, 251–257 (1991)

    Article  Google Scholar 

  19. McClelland, J.L., Rumelhardt, D.E.: The PDP research group: Parallel Distributed Processing, vol. 2, ch.18, pp. 217–268. MIT Press, Cambridge (1986)

    Google Scholar 

  20. Lee, K.F.: Large Vocabulary Speaker Independent Continuous Speech Recognition: the SPHINX System. PhD dissertation, Computer Science Department, Carnegie Mellon University (1988)

    Google Scholar 

  21. Ney, H.: The Use of a One-Stage Dynamic Programing Algorithm for Connected Word Recognition. IEEE Trans. on ASSP 32(2), 263–271 (1984)

    Article  Google Scholar 

  22. Schmidbauer, O., Tebelskis, J.: An LVQ Based Reference Model for Speaker-Adaptive Speech Recognition. In: IEEE Int. Conf. on ASSP, vol. 1, pp. 441–445 (1992)

    Google Scholar 

  23. Kohonen, T., Barna, G., Chrisley, R.: Statistical Pattern Recognition with Neural Networks: Benchmarking Studies. In: Proc. IEEE Int. Conf. on Neural Networks, pp. 61–66 (1988)

    Google Scholar 

  24. Mellouk, A., Gallinari, P.: A Discriminative Neural Prediction System for Speech Recognition. In: Proc. IEEE Int. Conf. on ASSP, pp. 533–536 (1993)

    Google Scholar 

  25. Mellouk, A., Gallinari, P.: Discriminative Training for Improved Neural Prediction Systems. In: Proc. IEEE Int. Conf. on ASSP, pp. I 233–236 (1994)

    Google Scholar 

  26. Mellouk, A., Gallinari, P.: Global Discrimination for Neural Predictive Systems based on N-best algorithm. In: Proc. IEEE Int. Conf. on ASSP, pp. 465–468 (1995)

    Google Scholar 

  27. Gallinari, P.: Predictive Models for Sequence Modelling, Application to Speech and Character Recognition (2004), http://citeseer.ist.psu.edu/28957.html (accessed October 2004)

  28. NATO ASI on Dynamics of Speech Production and Perception. Kluwer Academic Publishers, Dordrecht (2002)

    Google Scholar 

  29. Deng, L., Huang, X.: Challenges in Adopting Speech Recognition. Comm. of the ACM 47(1), 69–75 (2004)

    Article  Google Scholar 

  30. Forbes, B.J., Pike, E.R.: Acoustical Klein-Gordon Equation: A Time-Independent Perturbation Analysis. Phys. Rev. Lett. 93, 054301 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Petek, B. (2005). Predictive Connectionist Approach to Speech Recognition. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_10

Download citation

  • DOI: https://doi.org/10.1007/11520153_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27441-4

  • Online ISBN: 978-3-540-31886-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics