Skip to main content

Abstract

Large vocabulary speech recognition is very expensive computationally. We explore multi-pass search strategies as a way to reduce computation substantially, without any increase in error rate. We consider two basic strategies: the N-best Paradigm, and the Forward-Backward search. Both of these strategies operate on the entire sentence in (at least) two passes. The N-best Paradigm computes alternative hypotheses for a sentence, which can later be rescored using more detailed and more expensive knowledge sources. We present and compare many algorithms for finding the N-best sentence hypotheses, and suggest which are the most efficient and accurate. The Forward-Backward Search performs a time-synchronous forward search that finds all of the words that are likely to end at each frame within an utterance. Then, a second more expensive search can be performed in the backward direction, restricting its attention to those words found in the forward pass.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. F. Alleva, X. Huang and M.-Y. Hwang, “An Improved Search Algorithm Using Incremental Knowledge for Continuous Speech Recognition”, IEEE ICASSP-93, pp. II–307–310, April 1993.

    Google Scholar 

  2. S. Austin, R. Schwartz, and P. Placeway, “The Forward-Backward Search Strategy for Real-Time Speech Recognition”, IEEE ICASSP-91, Toronto, Canada, pp. 697–700, May 1991. Also in Proc. of the DARPA Speech and Natural Language Workshop, Hidden Valley, June 1990.

    Google Scholar 

  3. A. Austin, G. Zavaliagkos, J. Makhoul and R. Schwartz, “Speech Recognition using Segmental Neural Nets”, IEEE ICASSP-92.

    Google Scholar 

  4. L. R. Bahl, P. de Souza, P. S. Gopalakrishnan, D. Kanevsky and D. Na-hamoo, “Constructing Groups of Acoustically Confusable Words”, IEEE ICASSP-90, April 1990.

    Google Scholar 

  5. J.-L. Gauvain, L. F. Lamel, G. Adda, and M. Adda-Decker, “The LIMSI Continuous Speech Dictation System: Evaluation on the ARPA Wall Street Journal Task” IEEE ICASSP-94, pp. 557–560, Adelaide, Australia, April 1994.

    Google Scholar 

  6. L Gillick and R. Roth, “A Rapid Match Algorithm for Continuous Speech Recognition”, Proc. of the DARPA Speech and Natural Language Workshop, Hidden Valley, June 1990.

    Google Scholar 

  7. P. S. Gopalakrishnan, L. R. Bahl and R. L. Mercer, “A Tree Search Strategy for Large-Vocabulary Continuous Speech Recognition”, IEEE ICASSP-95, pp. I–572–575, Detroit, MI, May, 1995.

    Google Scholar 

  8. J. Mariño and E. Monte, “Generation of Multiple Hypothesis in Connected Phonetic-Unit Recognition by a Modified One-Stage Dynamic Programming Algorithm”, Proc. of the EuroSpeech-89, Vol. 2, pp. 408–411, Paris, Sept. 1989.

    Google Scholar 

  9. H. Murveit, J. Butzberger, V. Digalakis and M. Weintraub, “Large Vocabulary Dictation using SRI’s Decipher Speech Recognition System: Progressive Search Techniques”, IEEE ICASSP-93, Vol. II pp. 319–322, Minneapolis, MN, April, 1993.

    Google Scholar 

  10. L. Nguyen, R. Schwartz, F. Kubala and P. Placeway, “Search Algorithms for Software-Only Real-Time Recognition with Very Large Vocabularies”, Proc. of ARPA Human Language Technology Workshop, pp. 91–95, Plains-boro, NJ, Mar. 1993.

    Google Scholar 

  11. L. Nguyen, R. Schwartz, Y. Zhao and G. Zavaliagkos, “Is N-best Dead?”, Proc. of ARPA Human Language Technology Workshop, pp. 411–414, Plainsboro, NJ, Mar. 1994.

    Google Scholar 

  12. M. Ostendorf, A. Kannan, O. Kimball, R. Schwartz, S. Austin and R. Rohlicek, “Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses”. Proc. of the DARPA Speech and Natural Language Workshop, Monterey, Feb. 1991.

    Google Scholar 

  13. D. Paul, “Algorithms for an Optimal A* Search and Linearizing the Search in the Stack Decoder”, IEEE ICASSP-91, pp. 693–696, Toronto, Canada, May 1991.

    Google Scholar 

  14. P. Price, W. M. Fisher, J. Bernstein and D.S. Pallett, “The DARPA 1000-Word Resource Management Database for Continuous Speech Recognition,” IEEE ICASSP-88, pp. 651–654, New York, NY, April 1988.

    Google Scholar 

  15. R. Schwartz and Y. L. Chow, “The N-Best Algorithm: An Efficient and Exact Procedure for Finding the N Most Likely Sentence Hypotheses”, IEEE ICASSP-90, pp. 81–84, Albuquerque, April 1990. Also in Proc. of the DARPA Speech and Natural Language Workshop, Cape Cod, Oct. 1989.

    Google Scholar 

  16. R. Schwartz and S. Austin, “A Comparison Of Several Approximate Algorithms for Finding Multiple (N-Best) Sentence Hypotheses”, IEEE ICASSP-91, pp. 701–704, Toronto, Canada, May 1991.

    Google Scholar 

  17. F. Soong and E. Huang, “A Tree-Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition”. IEEE ICASSP-91, pp. 705–708, Toronto, Canada, May 1991. Also in Proc. of the DARPA Speech and Natural Language Workshop, Hidden Valley, June 1990.

    Google Scholar 

  18. V. Steinbiss, “Sentence-Hypotheses Generation in a Continuous-Speech Recognition System,” Proc. EuroSpeech-89, Vol. 2, pp. 51–54, Paris, Sept. 1989.

    Google Scholar 

  19. P. Woodland, C. Legetter, J. Odell, V. Valtchev, and S. Young, “The Development of the 1994 HTK Large Vocabulary Speech Recognition System”, Proc. of ARPA Spoken Language Technology Workshop, pp. 104–109, Austin, TX, January, 1995.

    Google Scholar 

  20. S. Young, “Generating Multiple Solutions from Connected Word DP Recognition Algorithms”. Proc. of the Institute of Acoustics, Vol. 6 Part 4, pp. 351–354, 1984.

    Google Scholar 

  21. G. Zavaliagkos, S. Austin, J. Makhoul and R. Schwartz, “A Hybrid Continuous Speech Recognition System Using Segmental Neural Nets With Hidden Markov Models”, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 7, No. 4, pp. 949–963, 1993.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Schwartz, R., Nguyen, L., Makhoul, J. (1996). Multiple-Pass Search Strategies. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1367-0_18

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8590-8

  • Online ISBN: 978-1-4613-1367-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics