Skip to main content

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

  • 4520 Accesses

Abstract

Despite the great importance of integrated search methods for the application of Markov models to real-world problems, their presentation is frequently neglected in the literature. For current systems for speech or handwriting recognition, it is often difficult to find out from the respective publications which algorithmic solutions were chosen for the problem of combining HMMs and n-gram models.

Therefore, we will present the most important methods for the integration of HMMs and n-gram models in this chapter. At the beginning the probably oldest and also simplest technique is described where a model network is created on the basis of partial models in which the n-gram scores serve as transition probabilities. In multi-pass search methods, long-span n-gram restrictions are applied only in a second search process in order to reduce the total search effort. Integrated time-synchronous decoding procedures are usually based on the efficient representation of the recognition lexicon as a prefix tree. In such configurations it is necessary to create copies of the resulting search space in order to be able to integrate the n-gram score directly into the search. The chapter concludes with a presentation of a flexible technique for the integrated time-synchronous search in combined HMM/n-gram models which is also capable of handling long-span context restrictions efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This is mainly due to the fact that probabilities on largely differing time-scales enter into the calculations for P(w) and P(X|w), respectively. For determining the HMM score, state transition and output probabilities accumulate per time step, i.e., with the “clock pulse” of the signal. The score of the word sequence, in contrast, is generated by multiplying one conditional probability per word and, therefore, comprises one to two orders of magnitude fewer probability components.

  2. 2.

    Frequently, the reason for a weighted combination of n-gram and HMM scores being necessary is said to be that the models were generated on different data and, therefore, would not be completely compatible. However, this is at most of marginal importance as, e.g., experiments within the German Verbmobil project showed clearly. For the extensive evaluation in the year 1996 the training data for HMMs and language model was identical. A weighting of the scores was still necessary, though.

  3. 3.

    In [331] an evaluation of the meta-parameters for combining HMMs and n-gram models is reported for a writer-independent handwriting recognition task.

  4. 4.

    Without the use of a real n-gram language model the term β |w| has the same effect as a simple zero-gram model.

  5. 5.

    A notable exception is the book by Huang and colleagues where a good overview over possible techniques is given [123, Chap. 13, pp. 645–662].

References

  1. Aubert, X., Dugast, C., Ney, H., Steinbiss, V.: Large vocabulary continuous speech recognition of Wall Street Journal data. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Adelaide, vol. II, pp. 129–132 (1994)

    Google Scholar 

  2. Billa, J., Ma, K., McDonough, J.W., Zavaliagkos, G., Miller, D.R., Ross, K.N., El-Jaroudi, A.: Multilingual speech recognition: the 1996 Byblos Callhome system. In: Proc. European Conf. on Speech Communication and Technology, Rhodes, Greece, vol. 1, pp. 363–366 (1997)

    Google Scholar 

  3. Federico, M., Cettelo, M., Brugnara, F., Antoniol, G.: Language modelling for efficient beam-search. Comput. Speech Lang. 9, 353–379 (1995)

    Article  Google Scholar 

  4. Fink, G.A., Sagerer, G.: Zeitsynchrone Suche mit n-Gramm-Modellen höherer Ordnung (Time-synchonous search with higher-order n-gram models). In: Konvens 2000/Sprachkommunikation. ITG-Fachbericht, vol. 161, pp. 145–150. VDE Verlag, Berlin (2000) (in German)

    Google Scholar 

  5. Fink, G.A., Schillo, C., Kummert, F., Sagerer, G.: Incremental speech recognition for multimodal interfaces. In: Proc. Annual Conference of the IEEE Industrial Electronics Society, Aachen, vol. 4, pp. 2012–2017 (1998)

    Google Scholar 

  6. Gauvain, J.L., Lamel, L.F., Adda, G., Adda-Decker, M.: The LIMSI continuous speech dictation system: evaluation on the ARPA Wall Street Journal task. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Adelaide, vol. I, pp. 557–560 (1994)

    Google Scholar 

  7. Hain, T., Woodland, P.C., Niesler, T.R., Whittaker, E.W.D.: The 1998 HTK system for transcription of conversational telephone speech. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Phoenix, AZ (1999)

    Google Scholar 

  8. Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)

    Google Scholar 

  9. Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)

    Google Scholar 

  10. Jelinek, F., Bahl, L.R., Mercer, R.L.: Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Trans. Inf. Theory 21(3), 250–256 (1975)

    Article  MATH  Google Scholar 

  11. Jelinek, F., Mercer, R.L., Bahl, L.R.: Continuous speech recognition. In: Krishnaiah, P.R., Kanal, L.N. (eds.) Handbook of Statistics, vol. 2, pp. 549–573. North-Holland, Amsterdam (1982)

    Google Scholar 

  12. Lowerre, B.T.: The HARPY speech recognition system. PhD thesis, Carnegie-Mellon University, Department of Computer Science, Pittsburgh (1976)

    Google Scholar 

  13. Ney, H., Haeb-Umbach, R., Tran, B.H., Oerder, M.: Improvements in beam search for 10000-word continuous speech recognition. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, San Francisco, vol. 1, pp. 9–12 (1992)

    Google Scholar 

  14. Ney, H., Ortmanns, S.: Dynamic programming search for continuous speech recognition. IEEE Signal Process. Mag. 16(5), 64–83 (1999)

    Article  Google Scholar 

  15. Ortmanns, S., Eiden, A., Ney, H., Coenen, N.: Look-ahead techniques for fast beam search. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, München, vol. 3, pp. 1783–1786 (1997)

    Google Scholar 

  16. Ortmanns, S., Ney, H.: Look-ahead techniques for fast beam search. Comput. Speech Lang. 14, 15–32 (2000)

    Article  Google Scholar 

  17. Ortmanns, S., Ney, H.: The time-conditioned approach in dynamic programming search for LVCSR. IEEE Trans. Audio Speech Lang. Process. 8(6), 676–687 (2000)

    Google Scholar 

  18. Ortmanns, S., Ney, H., Eiden, A.: Language-model look-ahead for large vocabulary speech recognition. In: Proc. Int. Conf. on Spoken Language Processing, Philadelphia, pp. 2095–2098 (1996)

    Google Scholar 

  19. Ortmanns, S., Ney, H., Seide, F., Lindam, I.: A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition. In: Proc. Int. Conf. on Spoken Language Processing, Philadelphia, pp. 2091–2094 (1996)

    Google Scholar 

  20. Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)

    Google Scholar 

  21. Schwartz, R., Nguyen, L., Kubala, F., Chou, G., Zavaliagkos, G., Makhoul, J.: On using written language training data for spoken language modeling. In: Proc. Workshop on Human Language Technology, HLT ’94, pp. 94–98 (1994)

    Chapter  Google Scholar 

  22. Steinbiss, V., Tran, B.-H., Ney, H.: Improvements in beam search. In: Proc. Int. Conf. on Spoken Language Processing, Yokohama, Japan, vol. 4, pp. 2143–2146 (1994)

    Google Scholar 

  23. Zimmermann, M., Bunke, H.: Optimizing the integration of a statistical language model in HMM based offline handwritten text recognition. In: Proc. Int. Conf. on Pattern Recognition, Cambridge, UK, vol. 2, pp. 541–544 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag London

About this chapter

Cite this chapter

Fink, G.A. (2014). Integrated Search Methods. In: Markov Models for Pattern Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6308-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-6308-4_12

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-6307-7

  • Online ISBN: 978-1-4471-6308-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics