Predictive Connectionist Approach to Speech Recognition

Petek, Bojan

doi:10.1007/11520153_10

Predictive Connectionist Approach to Speech Recognition

Bojan Petek²²

Conference paper

1152 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3445))

Abstract

This tutorial describes a context-dependent Hidden Control Neural Network (HCNN) architecture for large vocabulary continuous speech recognition. Its basic building element, the context-dependent HCNN model, is connectionist network trained to capture dynamics of sub-word units of speech. The described HCNN model belongs to a family of Hidden Markov Model/Multi-Layer Perceptron (HMM/MLP) hybrids, usually referred to as Predictive Neural Networks [1]. The model is trained to generate continuous real-valued output vector predictions as opposed to estimate maximum a posteriori probabilities (MAP) when performing pattern classification. Explicit context-dependent modeling is introduced to refine the baseline HCNN model for continuous speech recognition. The extended HCNN system was initially evaluated on the Conference Registration Database of CMU. On the same task, the HCNN modeling yielded better generalization performance than the Linked Predictive Neural Networks (LPNN). Additionally, several optimizations were possible when implementing the HCNN system. The tutorial concludes with the discussion of future research in the area of predictive connectionist approach to speech recognition.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: a Hybrid Approach. Kluwer Academic Publishers, Dordrecht (1994)
Google Scholar
Lapedes, A., Farber, R.: Nonlinear Signal Processing Using Neural Networks: Prediction and System Modelling. Technical Report LA-UR-87-2662, Los Alamos National Laboratory (1987)
Google Scholar
Iso, K.: Speech Recognition Using Neural Prediction Model. IEICE Technical Report SP89-23, 81-87 (1989)
Google Scholar
Iso, K., Watanabe, T.: Speaker-Independent Word Recognition Using a Neural Prediction Model. In: Proc. IEEE Int. Conf. on ASSP, pp. 441–444 (1990)
Google Scholar
Iso, K., Watanabe, T.: Speech Recognition Using Demi-Syllable Neural Prediction Model. Advances in Neural Information Processing Systems 3, 227–233 (1991)
Google Scholar
Iso, K., Watanabe, T.: Large Vocabulary Speech Recognition Using Neural Prediction Model. In: Proc. IEEE Int. Conf. on ASSP, pp. 57–60 (1991)
Google Scholar
Levin, E.: Word Recognition Using Hidden Control Neural Architecture. In: Proc. Speech- Tech 1990, pp. 20-25 (1990)
Google Scholar
Levin, E.: Word Recognition Using Hidden Control Neural Architecture. Proc. IEEE Int. Conf. on ASSP, pp. 433-436 (1990)
Google Scholar
Levin, E.: Modeling Time Varying Systems Using a Hidden Control Neural Network Architecture. Advances in Neural Information Processing Systems 3, 147–154 (1991)
Google Scholar
Tebelskis, J., Waibel, A.: Large Vocabulary Recognition Using Linked Predictive Neural Networks. In: Proc. IEEE Int. Conf. on ASSP, pp. 437–440 (1990)
Google Scholar
Tebelskis, J., Waibel, A., Petek, B., Schmidbauer, O.: Continuous Speech Recognition by Linked Predictive Neural Networks. Advances in Neural Information Processing Systems 3, 199–205 (1991)
Google Scholar
Tebelskis, J., Waibel, A., Petek, B., Schmidbauer, O.: Continuous Speech Recognition Using Linked Predictive Neural Networks. In: Proc. IEEE Int. Conf. on ASSP, pp. 61–64 (1991)
Google Scholar
Tebelskis, J.: Speech Recognition using Neural Networks. PhD thesis, School of Computer Science, Pittsburgh, PA (1995)
Google Scholar
Tishby, N.: A Dynamical Systems Approach to Speech Processing. In: Proc. IEEE Int. Conf. on ASSP, pp. 365-368 (1990)
Google Scholar
Cybenko, G.: Approximation by Superpositions of a Sigmoidal Function. Technical report CSRD 856, University of Illinois (1989)
Google Scholar
Funahashi, K.: On the Approximate Realization of Continuous Mappings by Neural Networks. Neural Networks 2, 183–192 (1989)
Article Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multi-Layer Feedforward Networks are Universal Approximators. Technical Report USCD (1989)
Google Scholar
Hornik, K.: Approximation Capabilities of Multilayer Feedforward Networks. Neural Networks 4, 251–257 (1991)
Article Google Scholar
McClelland, J.L., Rumelhardt, D.E.: The PDP research group: Parallel Distributed Processing, vol. 2, ch.18, pp. 217–268. MIT Press, Cambridge (1986)
Google Scholar
Lee, K.F.: Large Vocabulary Speaker Independent Continuous Speech Recognition: the SPHINX System. PhD dissertation, Computer Science Department, Carnegie Mellon University (1988)
Google Scholar
Ney, H.: The Use of a One-Stage Dynamic Programing Algorithm for Connected Word Recognition. IEEE Trans. on ASSP 32(2), 263–271 (1984)
Article Google Scholar
Schmidbauer, O., Tebelskis, J.: An LVQ Based Reference Model for Speaker-Adaptive Speech Recognition. In: IEEE Int. Conf. on ASSP, vol. 1, pp. 441–445 (1992)
Google Scholar
Kohonen, T., Barna, G., Chrisley, R.: Statistical Pattern Recognition with Neural Networks: Benchmarking Studies. In: Proc. IEEE Int. Conf. on Neural Networks, pp. 61–66 (1988)
Google Scholar
Mellouk, A., Gallinari, P.: A Discriminative Neural Prediction System for Speech Recognition. In: Proc. IEEE Int. Conf. on ASSP, pp. 533–536 (1993)
Google Scholar
Mellouk, A., Gallinari, P.: Discriminative Training for Improved Neural Prediction Systems. In: Proc. IEEE Int. Conf. on ASSP, pp. I 233–236 (1994)
Google Scholar
Mellouk, A., Gallinari, P.: Global Discrimination for Neural Predictive Systems based on N-best algorithm. In: Proc. IEEE Int. Conf. on ASSP, pp. 465–468 (1995)
Google Scholar
Gallinari, P.: Predictive Models for Sequence Modelling, Application to Speech and Character Recognition (2004), http://citeseer.ist.psu.edu/28957.html (accessed October 2004)
NATO ASI on Dynamics of Speech Production and Perception. Kluwer Academic Publishers, Dordrecht (2002)
Google Scholar
Deng, L., Huang, X.: Challenges in Adopting Speech Recognition. Comm. of the ACM 47(1), 69–75 (2004)
Article Google Scholar
Forbes, B.J., Pike, E.R.: Acoustical Klein-Gordon Equation: A Time-Independent Perturbation Analysis. Phys. Rev. Lett. 93, 054301 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Interactive Systems Laboratory, University of Ljubljana, Snežniška 5, 1000, Ljubljana, Slovenia
Bojan Petek

Authors

Bojan Petek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS LTCI/TSI Paris, 46 rue Barrault, 75634, Paris Cedex 13, France
Gérard Chollet
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare, SA, Italy
Anna Esposito
Escola Universitària Politècnica de Mataró, Universitat Politècnica de Catalunya, Barcelona, Spain
Marcos Faundez-Zanuy
Dipartimento di Fisica “E.R. Caianiello”, Università degli Studi di Salerno, Via S. Allende, 84081, Baronissi, SA, Italy
Maria Marinaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Petek, B. (2005). Predictive Connectionist Approach to Speech Recognition. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_10

Download citation

DOI: https://doi.org/10.1007/11520153_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics