Architecture and Search Organization for Large Vocabulary Continuous Speech Recognition

Ortmanns, Stefan; Welling, Lutz; Beulen, Klaus; Wessel, Frank; Ney, Hermann

doi:10.1007/978-3-642-60831-5_58

Stefan Ortmanns³,
Lutz Welling³,
Klaus Beulen³,
Frank Wessel³ &
…
Hermann Ney³

Part of the book series: Informatik aktuell ((INFORMAT))

171 Accesses

Abstract

This paper gives an overview of an architecture and search organization for large vocabulary, continuous speech recognition (LVCSR at RWTH). In the first part of the paper, we describe the principle and architecture of a LVCSR system. In particular, the isssues of modeling and search for phoneme based recognition are discussed. In the second part, we review the word conditioned lexical tree search algorithm from the viewpoint of how the search space is organized. Further, we extend this method to produce high quality word graphs. Finally, we present some recognition results on the ARPA North American Business (NAB’94) task for a 64 000-word vocabulary (American English, continuous speech, speaker independent).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. K. Baker: “Stochastic Modeling for Automatic Speech Understanding”, in D. R. Reddy (ed.): ‘Speech Recognition’, Academic Press, New York, pp. 512–542, 1975.
Google Scholar
F. Alleva, X. Huang, M.-Y Hwang: Improvements on the Pronunciation Prefix Tree Search Organization. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Atlanta, GA, pp. 133–136, May 1996.
Google Scholar
L. R. Bahl, F. Jelinek, R. L. Mercer: A Maximum Likelihood Approach to Continuous Speech Recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 5, pp. 179–190, March 1983.
Google Scholar
C. Dugast, R. Kneser, X. Aubert, S. Ortmanns, K. Beulen, H. Ney: Continuous Speech Recognition Tests and Results for the NAB’94 Corpus. Proc. ARPA Spoken Language Technology Workshop, Austin, TX, pp. 156–161, January 1995.
Google Scholar
S. E. Levinson, L. R. Rabiner, M. M. Sondhi: An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition. The Bell System Technical Journal, Vol. 62, No. 4, pp. 1035–1074, April 1983.
MathSciNet MATH Google Scholar
H. Ney: The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition. IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-32, No. 2, pp. 263–271, April 1984.
Article Google Scholar
Ney, H., Haeb-Umbach, R., Tran, B.-H. & Oerder, M.: Improvements in Beam Search for 10000-Word Continuous Speech Recognition. 1992 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, San Francisco, CA, pp. 13–16, March 1992.
Google Scholar
H. Ney, D. Mergel, A. Noll, A. Paeseler: Data Driven Organization of the Dynamic Programming Beam Search for Continuous Speech Recognition. IEEE Trans. on Signal Processing, Vol. SP-40, No. 2, pp. 272–281, February 1992.
Article Google Scholar
H. Ney: Search Strategies for Large-Vocabulary Continuous-Speech Recognition. NATO Advanced Studies Institute, Bubion, Spain, June-July 1993, pp. 210–225, in A.J. Rubio Ayuso, J.M. Lopez Soler (eds.): ‘Speech Recognition and Coding New Advances and Trends’, Springer, Berlin, 1995.
Chapter Google Scholar
H. Ney, X. Aubert: A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition. Proc. Int. Conf. on Spoken Language Processing, Yokohama, Japan, pp. 1355–1358, September 1994.
Google Scholar
S. Ortmanns, H. Ney, F. Seide, I. Lindam: A Comparison of Time Conditioned and Word Conditioned Search Techniques for Large Vocabulary Speech Recognition. Proc. Int. Conf. on Spoken Language Processing, Philadelphia, PA, pp. 2091–2094, October 1996.
Google Scholar
S. Ortmanns, A. Eiden, H. Ney, N. Coenen: Look-Ahead Techniques for Fast Beam Search. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Munich, Germany, Vol. 3, pp. 1783–1786, April 1997.
Google Scholar
S. Ortmanns, H. Ney, X. Aubert: A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition. Computer, Speech and Language, Vol. 11, No. 1, pp. 43–72, January 1997.
Article Google Scholar
R. Schwartz, S. Austin: A Comparison of Several Approximate Algorithms for Finding Multiple (N-Best) Sentence Hypotheses. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Toronto, pp. 701–704, May 1991.
Google Scholar
V. Steinbiss, B.-H. Tran, H. Ney: Improvements in Beam Search. Proc. Int. Conf. on Spoken Language Processing, Yokohama, Japan, pp. 2143–2146, September 1994.
Google Scholar
F. Wessel, S. Ortmanns, H. Ney: Implementation of Word Based Statistical Language Models. Proc. SQEL Workshop on Multi-Lingual Information Retrieval Dialogs, Pilsen, Czech Republic, pp. 55–59, April 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Informatik VI, RWTH Aachen, D-52056, Aachen, Germany
Stefan Ortmanns, Lutz Welling, Klaus Beulen, Frank Wessel & Hermann Ney

Authors

Stefan Ortmanns
View author publications
You can also search for this author in PubMed Google Scholar
Lutz Welling
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Beulen
View author publications
You can also search for this author in PubMed Google Scholar
Frank Wessel
View author publications
You can also search for this author in PubMed Google Scholar
Hermann Ney
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Lehrstuhl für Informatik V, RWTH Aachen, D-52056, Aachen, Germany
Matthias Jarke & Klaus Pohl &
Philips GmbH Forschungslaboratorien, Weißhausstr. 2, D-52066, Aachen, Germany
Klaus Pasedach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ortmanns, S., Welling, L., Beulen, K., Wessel, F., Ney, H. (1997). Architecture and Search Organization for Large Vocabulary Continuous Speech Recognition. In: Jarke, M., Pasedach, K., Pohl, K. (eds) Informatik ’97 Informatik als Innovationsmotor. Informatik aktuell. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60831-5_58

Download citation

DOI: https://doi.org/10.1007/978-3-642-60831-5_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63066-1
Online ISBN: 978-3-642-60831-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics