Abstract
This paper gives an overview of an architecture and search organization for large vocabulary, continuous speech recognition (LVCSR at RWTH). In the first part of the paper, we describe the principle and architecture of a LVCSR system. In particular, the isssues of modeling and search for phoneme based recognition are discussed. In the second part, we review the word conditioned lexical tree search algorithm from the viewpoint of how the search space is organized. Further, we extend this method to produce high quality word graphs. Finally, we present some recognition results on the ARPA North American Business (NAB’94) task for a 64 000-word vocabulary (American English, continuous speech, speaker independent).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. K. Baker: “Stochastic Modeling for Automatic Speech Understanding”, in D. R. Reddy (ed.): ‘Speech Recognition’, Academic Press, New York, pp. 512–542, 1975.
F. Alleva, X. Huang, M.-Y Hwang: Improvements on the Pronunciation Prefix Tree Search Organization. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Atlanta, GA, pp. 133–136, May 1996.
L. R. Bahl, F. Jelinek, R. L. Mercer: A Maximum Likelihood Approach to Continuous Speech Recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 5, pp. 179–190, March 1983.
C. Dugast, R. Kneser, X. Aubert, S. Ortmanns, K. Beulen, H. Ney: Continuous Speech Recognition Tests and Results for the NAB’94 Corpus. Proc. ARPA Spoken Language Technology Workshop, Austin, TX, pp. 156–161, January 1995.
S. E. Levinson, L. R. Rabiner, M. M. Sondhi: An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition. The Bell System Technical Journal, Vol. 62, No. 4, pp. 1035–1074, April 1983.
H. Ney: The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition. IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-32, No. 2, pp. 263–271, April 1984.
Ney, H., Haeb-Umbach, R., Tran, B.-H. & Oerder, M.: Improvements in Beam Search for 10000-Word Continuous Speech Recognition. 1992 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, San Francisco, CA, pp. 13–16, March 1992.
H. Ney, D. Mergel, A. Noll, A. Paeseler: Data Driven Organization of the Dynamic Programming Beam Search for Continuous Speech Recognition. IEEE Trans. on Signal Processing, Vol. SP-40, No. 2, pp. 272–281, February 1992.
H. Ney: Search Strategies for Large-Vocabulary Continuous-Speech Recognition. NATO Advanced Studies Institute, Bubion, Spain, June-July 1993, pp. 210–225, in A.J. Rubio Ayuso, J.M. Lopez Soler (eds.): ‘Speech Recognition and Coding New Advances and Trends’, Springer, Berlin, 1995.
H. Ney, X. Aubert: A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition. Proc. Int. Conf. on Spoken Language Processing, Yokohama, Japan, pp. 1355–1358, September 1994.
S. Ortmanns, H. Ney, F. Seide, I. Lindam: A Comparison of Time Conditioned and Word Conditioned Search Techniques for Large Vocabulary Speech Recognition. Proc. Int. Conf. on Spoken Language Processing, Philadelphia, PA, pp. 2091–2094, October 1996.
S. Ortmanns, A. Eiden, H. Ney, N. Coenen: Look-Ahead Techniques for Fast Beam Search. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Munich, Germany, Vol. 3, pp. 1783–1786, April 1997.
S. Ortmanns, H. Ney, X. Aubert: A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition. Computer, Speech and Language, Vol. 11, No. 1, pp. 43–72, January 1997.
R. Schwartz, S. Austin: A Comparison of Several Approximate Algorithms for Finding Multiple (N-Best) Sentence Hypotheses. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Toronto, pp. 701–704, May 1991.
V. Steinbiss, B.-H. Tran, H. Ney: Improvements in Beam Search. Proc. Int. Conf. on Spoken Language Processing, Yokohama, Japan, pp. 2143–2146, September 1994.
F. Wessel, S. Ortmanns, H. Ney: Implementation of Word Based Statistical Language Models. Proc. SQEL Workshop on Multi-Lingual Information Retrieval Dialogs, Pilsen, Czech Republic, pp. 55–59, April 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ortmanns, S., Welling, L., Beulen, K., Wessel, F., Ney, H. (1997). Architecture and Search Organization for Large Vocabulary Continuous Speech Recognition. In: Jarke, M., Pasedach, K., Pohl, K. (eds) Informatik ’97 Informatik als Innovationsmotor. Informatik aktuell. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60831-5_58
Download citation
DOI: https://doi.org/10.1007/978-3-642-60831-5_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63066-1
Online ISBN: 978-3-642-60831-5
eBook Packages: Springer Book Archive