Fast Search for Large Vocabulary Speech Recognition

  • Stephan Kanthak
  • Achim Sixtus
  • Sirko Molau
  • Ralf Schlüter
  • Hermann Ney
Part of the Artificial Intelligence book series (AI)

Abstract

In this article we describe methods for improving the RWTH German speech recognizer used within the Verbmobil project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. We also study incremental methods to reduce the response time of the online speech recognizer. Finally, we present experimental off-line results for the three Verbmobil scenarios. We report on word error rates and real-time factors for both speaker independent and speaker dependent recognition.

Keywords

Covariance Recombination Turkey Prefix Acoustics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alleva, F., Hon, H., Huang, X., Hwang, M., Rosenfeld, R., and Weide, R. (1992). Applying SPHINX-II to the DARPA Wall Street Journal CSR Task. In Proceedings of the DARPA Speech and Natural Language Workshop, Harriman, NY, February 1992, 393–398.CrossRefGoogle Scholar
  2. Aubert, X.L. (1999). One Pass Cross Word Decoding For Large Vocabularies Based on a Lexical Tree Search Organization. In Proceedings of the European Conference on Speech Communication and Technology, Budapest, Hungary, September 1999, 1559–1562.Google Scholar
  3. Beulen, K., Ortmanns, S., and Elting, C. (1999). Dynamic Programming Search Techniques for Across-Word Modeling in Speech Recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Phoenix, AZ, March 1999, 609–612.Google Scholar
  4. Beyerlein, P., Ullrich, M., and Wilcox, P. (1997). Modeling and Decoding of Crossword Context Dependent Phones in the Philips Large Vocabulary Continuous Speech Recognition System. In Proceedings of the European Conference on Speech Communication and Technology, Rhodes, Greece, September 1997, 1163–1166.Google Scholar
  5. Kanthak, S., Schütz, K., and Ney, N. (2000). Using SIMD Instructions for Fast Likelihood Calculation in LVCSR. To be published in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, June 2000.Google Scholar
  6. Lee, L., and Rose, R. (1996). Speaker Normalization using Efficient Frequency Warping Procedures. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, GA, May 1996, 353–356.Google Scholar
  7. Ney, H., Welling, L., Ortmanns, S., Beulen, K., and Wessel, F. (1998). The RWTH Large Vocabulary Continuous Speech Recognition System. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, WA, May 1998, 853–856.Google Scholar
  8. Ortmanns, S., Ney, H., and Eiden, A. (1996). Language-Model Look-Ahead for Large Vocabulary Speech Recognition. In Proceedings of the International Conference of Spoken Language Processing, Philadelphia, PA, October 1996, 2091–2094.Google Scholar
  9. Ortmanns, S., Ney, H., Eiden, A., and Coenen, N. (1996). Look-Ahead Techniques for Improved Beam Search. In Proceedings of the CRIM-FORWISS Workshop, Montreal, October 1996, 10–22.Google Scholar
  10. Ortmanns, S., Ney, N., and Aubert, X. (1997). A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition. In Computer, Speech and Language 11(1):43–72.Google Scholar
  11. Ortmanns, S., Ney, H., and Firzlaff, T. (1997). Fast Likelihood Computation Methods for Continuous Mixture Densities in Large Vocabulary Speech Recognition., In Proceedings of the European Conference on Speech Communication and Technology, Rhodes, Greece, September 1997, 139–142.Google Scholar
  12. Ortmanns, S., Reichl, W., and Chou, W. An Efficient Decoding Method for Real Time Speech Recognition. In Proceedings of the European Conference on Speech Communication and Technology, Budapest, Hungary, September 1999, 499–502.Google Scholar
  13. Spohrer, J.C., Brown, P.F., Hochschild, P.H., and Baker, J.K. (1980). Partial Traceback in Continuous Speech Recognition. In Proceedings of the International Conference on Cybernetics and Society, Cambridge, MA, October 1980, 36–42.Google Scholar
  14. Sixtus, A., and Ortmanns, S. (1999). High Quality Word Graphs Using Forward-Backward Pruning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Phoenix, AZ, March 1999, 593–596.Google Scholar
  15. Sixtus, A., Molau, S., Kanthak, S., Schlüter, R., and Ney, H. (2000). Recent Improvements of the RWTH Large Vocabulary Speech Recognition System on Spontaneous Speech. To be published in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, June 2000.Google Scholar
  16. Welling, L., Kanthak, S., and Ney, H. (1999). Improved Methods for Vocal Tract Normalization. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, Phoenix, AZ, March 1999, 761–764.Google Scholar
  17. Woodland, P.C., Odell, J.J., Valtchev, V., and Young, S.J. (1994). Large Vocabulary Continuous Speech Recognition using HTK. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 2, Adelaide, Australia, April 1994, 125–128.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Stephan Kanthak
    • 1
  • Achim Sixtus
    • 1
  • Sirko Molau
    • 1
  • Ralf Schlüter
    • 1
  • Hermann Ney
    • 1
  1. 1.Lehrstuhl für Informatik VI, Computer Science DepartmentRWTH Aachen-University of TechnologyGermany

Personalised recommendations