Abstract
This tutorial describes a context-dependent Hidden Control Neural Network (HCNN) architecture for large vocabulary continuous speech recognition. Its basic building element, the context-dependent HCNN model, is connectionist network trained to capture dynamics of sub-word units of speech. The described HCNN model belongs to a family of Hidden Markov Model/Multi-Layer Perceptron (HMM/MLP) hybrids, usually referred to as Predictive Neural Networks [1]. The model is trained to generate continuous real-valued output vector predictions as opposed to estimate maximum a posteriori probabilities (MAP) when performing pattern classification. Explicit context-dependent modeling is introduced to refine the baseline HCNN model for continuous speech recognition. The extended HCNN system was initially evaluated on the Conference Registration Database of CMU. On the same task, the HCNN modeling yielded better generalization performance than the Linked Predictive Neural Networks (LPNN). Additionally, several optimizations were possible when implementing the HCNN system. The tutorial concludes with the discussion of future research in the area of predictive connectionist approach to speech recognition.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: a Hybrid Approach. Kluwer Academic Publishers, Dordrecht (1994)
Lapedes, A., Farber, R.: Nonlinear Signal Processing Using Neural Networks: Prediction and System Modelling. Technical Report LA-UR-87-2662, Los Alamos National Laboratory (1987)
Iso, K.: Speech Recognition Using Neural Prediction Model. IEICE Technical Report SP89-23, 81-87 (1989)
Iso, K., Watanabe, T.: Speaker-Independent Word Recognition Using a Neural Prediction Model. In: Proc. IEEE Int. Conf. on ASSP, pp. 441–444 (1990)
Iso, K., Watanabe, T.: Speech Recognition Using Demi-Syllable Neural Prediction Model. Advances in Neural Information Processing Systems 3, 227–233 (1991)
Iso, K., Watanabe, T.: Large Vocabulary Speech Recognition Using Neural Prediction Model. In: Proc. IEEE Int. Conf. on ASSP, pp. 57–60 (1991)
Levin, E.: Word Recognition Using Hidden Control Neural Architecture. In: Proc. Speech- Tech 1990, pp. 20-25 (1990)
Levin, E.: Word Recognition Using Hidden Control Neural Architecture. Proc. IEEE Int. Conf. on ASSP, pp. 433-436 (1990)
Levin, E.: Modeling Time Varying Systems Using a Hidden Control Neural Network Architecture. Advances in Neural Information Processing Systems 3, 147–154 (1991)
Tebelskis, J., Waibel, A.: Large Vocabulary Recognition Using Linked Predictive Neural Networks. In: Proc. IEEE Int. Conf. on ASSP, pp. 437–440 (1990)
Tebelskis, J., Waibel, A., Petek, B., Schmidbauer, O.: Continuous Speech Recognition by Linked Predictive Neural Networks. Advances in Neural Information Processing Systems 3, 199–205 (1991)
Tebelskis, J., Waibel, A., Petek, B., Schmidbauer, O.: Continuous Speech Recognition Using Linked Predictive Neural Networks. In: Proc. IEEE Int. Conf. on ASSP, pp. 61–64 (1991)
Tebelskis, J.: Speech Recognition using Neural Networks. PhD thesis, School of Computer Science, Pittsburgh, PA (1995)
Tishby, N.: A Dynamical Systems Approach to Speech Processing. In: Proc. IEEE Int. Conf. on ASSP, pp. 365-368 (1990)
Cybenko, G.: Approximation by Superpositions of a Sigmoidal Function. Technical report CSRD 856, University of Illinois (1989)
Funahashi, K.: On the Approximate Realization of Continuous Mappings by Neural Networks. Neural Networks 2, 183–192 (1989)
Hornik, K., Stinchcombe, M., White, H.: Multi-Layer Feedforward Networks are Universal Approximators. Technical Report USCD (1989)
Hornik, K.: Approximation Capabilities of Multilayer Feedforward Networks. Neural Networks 4, 251–257 (1991)
McClelland, J.L., Rumelhardt, D.E.: The PDP research group: Parallel Distributed Processing, vol. 2, ch.18, pp. 217–268. MIT Press, Cambridge (1986)
Lee, K.F.: Large Vocabulary Speaker Independent Continuous Speech Recognition: the SPHINX System. PhD dissertation, Computer Science Department, Carnegie Mellon University (1988)
Ney, H.: The Use of a One-Stage Dynamic Programing Algorithm for Connected Word Recognition. IEEE Trans. on ASSP 32(2), 263–271 (1984)
Schmidbauer, O., Tebelskis, J.: An LVQ Based Reference Model for Speaker-Adaptive Speech Recognition. In: IEEE Int. Conf. on ASSP, vol. 1, pp. 441–445 (1992)
Kohonen, T., Barna, G., Chrisley, R.: Statistical Pattern Recognition with Neural Networks: Benchmarking Studies. In: Proc. IEEE Int. Conf. on Neural Networks, pp. 61–66 (1988)
Mellouk, A., Gallinari, P.: A Discriminative Neural Prediction System for Speech Recognition. In: Proc. IEEE Int. Conf. on ASSP, pp. 533–536 (1993)
Mellouk, A., Gallinari, P.: Discriminative Training for Improved Neural Prediction Systems. In: Proc. IEEE Int. Conf. on ASSP, pp. I 233–236 (1994)
Mellouk, A., Gallinari, P.: Global Discrimination for Neural Predictive Systems based on N-best algorithm. In: Proc. IEEE Int. Conf. on ASSP, pp. 465–468 (1995)
Gallinari, P.: Predictive Models for Sequence Modelling, Application to Speech and Character Recognition (2004), http://citeseer.ist.psu.edu/28957.html (accessed October 2004)
NATO ASI on Dynamics of Speech Production and Perception. Kluwer Academic Publishers, Dordrecht (2002)
Deng, L., Huang, X.: Challenges in Adopting Speech Recognition. Comm. of the ACM 47(1), 69–75 (2004)
Forbes, B.J., Pike, E.R.: Acoustical Klein-Gordon Equation: A Time-Independent Perturbation Analysis. Phys. Rev. Lett. 93, 054301 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Petek, B. (2005). Predictive Connectionist Approach to Speech Recognition. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_10
Download citation
DOI: https://doi.org/10.1007/11520153_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)