Abstract
Large vocabulary speech recognition is very expensive computationally. We explore multi-pass search strategies as a way to reduce computation substantially, without any increase in error rate. We consider two basic strategies: the N-best Paradigm, and the Forward-Backward search. Both of these strategies operate on the entire sentence in (at least) two passes. The N-best Paradigm computes alternative hypotheses for a sentence, which can later be rescored using more detailed and more expensive knowledge sources. We present and compare many algorithms for finding the N-best sentence hypotheses, and suggest which are the most efficient and accurate. The Forward-Backward Search performs a time-synchronous forward search that finds all of the words that are likely to end at each frame within an utterance. Then, a second more expensive search can be performed in the backward direction, restricting its attention to those words found in the forward pass.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
F. Alleva, X. Huang and M.-Y. Hwang, “An Improved Search Algorithm Using Incremental Knowledge for Continuous Speech Recognition”, IEEE ICASSP-93, pp. II–307–310, April 1993.
S. Austin, R. Schwartz, and P. Placeway, “The Forward-Backward Search Strategy for Real-Time Speech Recognition”, IEEE ICASSP-91, Toronto, Canada, pp. 697–700, May 1991. Also in Proc. of the DARPA Speech and Natural Language Workshop, Hidden Valley, June 1990.
A. Austin, G. Zavaliagkos, J. Makhoul and R. Schwartz, “Speech Recognition using Segmental Neural Nets”, IEEE ICASSP-92.
L. R. Bahl, P. de Souza, P. S. Gopalakrishnan, D. Kanevsky and D. Na-hamoo, “Constructing Groups of Acoustically Confusable Words”, IEEE ICASSP-90, April 1990.
J.-L. Gauvain, L. F. Lamel, G. Adda, and M. Adda-Decker, “The LIMSI Continuous Speech Dictation System: Evaluation on the ARPA Wall Street Journal Task” IEEE ICASSP-94, pp. 557–560, Adelaide, Australia, April 1994.
L Gillick and R. Roth, “A Rapid Match Algorithm for Continuous Speech Recognition”, Proc. of the DARPA Speech and Natural Language Workshop, Hidden Valley, June 1990.
P. S. Gopalakrishnan, L. R. Bahl and R. L. Mercer, “A Tree Search Strategy for Large-Vocabulary Continuous Speech Recognition”, IEEE ICASSP-95, pp. I–572–575, Detroit, MI, May, 1995.
J. Mariño and E. Monte, “Generation of Multiple Hypothesis in Connected Phonetic-Unit Recognition by a Modified One-Stage Dynamic Programming Algorithm”, Proc. of the EuroSpeech-89, Vol. 2, pp. 408–411, Paris, Sept. 1989.
H. Murveit, J. Butzberger, V. Digalakis and M. Weintraub, “Large Vocabulary Dictation using SRI’s Decipher Speech Recognition System: Progressive Search Techniques”, IEEE ICASSP-93, Vol. II pp. 319–322, Minneapolis, MN, April, 1993.
L. Nguyen, R. Schwartz, F. Kubala and P. Placeway, “Search Algorithms for Software-Only Real-Time Recognition with Very Large Vocabularies”, Proc. of ARPA Human Language Technology Workshop, pp. 91–95, Plains-boro, NJ, Mar. 1993.
L. Nguyen, R. Schwartz, Y. Zhao and G. Zavaliagkos, “Is N-best Dead?”, Proc. of ARPA Human Language Technology Workshop, pp. 411–414, Plainsboro, NJ, Mar. 1994.
M. Ostendorf, A. Kannan, O. Kimball, R. Schwartz, S. Austin and R. Rohlicek, “Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses”. Proc. of the DARPA Speech and Natural Language Workshop, Monterey, Feb. 1991.
D. Paul, “Algorithms for an Optimal A* Search and Linearizing the Search in the Stack Decoder”, IEEE ICASSP-91, pp. 693–696, Toronto, Canada, May 1991.
P. Price, W. M. Fisher, J. Bernstein and D.S. Pallett, “The DARPA 1000-Word Resource Management Database for Continuous Speech Recognition,” IEEE ICASSP-88, pp. 651–654, New York, NY, April 1988.
R. Schwartz and Y. L. Chow, “The N-Best Algorithm: An Efficient and Exact Procedure for Finding the N Most Likely Sentence Hypotheses”, IEEE ICASSP-90, pp. 81–84, Albuquerque, April 1990. Also in Proc. of the DARPA Speech and Natural Language Workshop, Cape Cod, Oct. 1989.
R. Schwartz and S. Austin, “A Comparison Of Several Approximate Algorithms for Finding Multiple (N-Best) Sentence Hypotheses”, IEEE ICASSP-91, pp. 701–704, Toronto, Canada, May 1991.
F. Soong and E. Huang, “A Tree-Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition”. IEEE ICASSP-91, pp. 705–708, Toronto, Canada, May 1991. Also in Proc. of the DARPA Speech and Natural Language Workshop, Hidden Valley, June 1990.
V. Steinbiss, “Sentence-Hypotheses Generation in a Continuous-Speech Recognition System,” Proc. EuroSpeech-89, Vol. 2, pp. 51–54, Paris, Sept. 1989.
P. Woodland, C. Legetter, J. Odell, V. Valtchev, and S. Young, “The Development of the 1994 HTK Large Vocabulary Speech Recognition System”, Proc. of ARPA Spoken Language Technology Workshop, pp. 104–109, Austin, TX, January, 1995.
S. Young, “Generating Multiple Solutions from Connected Word DP Recognition Algorithms”. Proc. of the Institute of Acoustics, Vol. 6 Part 4, pp. 351–354, 1984.
G. Zavaliagkos, S. Austin, J. Makhoul and R. Schwartz, “A Hybrid Continuous Speech Recognition System Using Segmental Neural Nets With Hidden Markov Models”, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 7, No. 4, pp. 949–963, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Kluwer Academic Publishers
About this chapter
Cite this chapter
Schwartz, R., Nguyen, L., Makhoul, J. (1996). Multiple-Pass Search Strategies. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_18
Download citation
DOI: https://doi.org/10.1007/978-1-4613-1367-0_18
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8590-8
Online ISBN: 978-1-4613-1367-0
eBook Packages: Springer Book Archive