Integrated Search Methods

Fink, Gernot A.

doi:10.1007/978-1-4471-6308-4_12

Gernot A. Fink⁴

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

4520 Accesses

Abstract

Despite the great importance of integrated search methods for the application of Markov models to real-world problems, their presentation is frequently neglected in the literature. For current systems for speech or handwriting recognition, it is often difficult to find out from the respective publications which algorithmic solutions were chosen for the problem of combining HMMs and n-gram models.

Therefore, we will present the most important methods for the integration of HMMs and n-gram models in this chapter. At the beginning the probably oldest and also simplest technique is described where a model network is created on the basis of partial models in which the n-gram scores serve as transition probabilities. In multi-pass search methods, long-span n-gram restrictions are applied only in a second search process in order to reduce the total search effort. Integrated time-synchronous decoding procedures are usually based on the efficient representation of the recognition lexicon as a prefix tree. In such configurations it is necessary to create copies of the resulting search space in order to be able to integrate the n-gram score directly into the search. The chapter concludes with a presentation of a flexible technique for the integrated time-synchronous search in combined HMM/n-gram models which is also capable of handling long-span context restrictions efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This is mainly due to the fact that probabilities on largely differing time-scales enter into the calculations for P(w) and P(X|w), respectively. For determining the HMM score, state transition and output probabilities accumulate per time step, i.e., with the “clock pulse” of the signal. The score of the word sequence, in contrast, is generated by multiplying one conditional probability per word and, therefore, comprises one to two orders of magnitude fewer probability components.
2.
Frequently, the reason for a weighted combination of n-gram and HMM scores being necessary is said to be that the models were generated on different data and, therefore, would not be completely compatible. However, this is at most of marginal importance as, e.g., experiments within the German Verbmobil project showed clearly. For the extensive evaluation in the year 1996 the training data for HMMs and language model was identical. A weighting of the scores was still necessary, though.
3.
In [331] an evaluation of the meta-parameters for combining HMMs and n-gram models is reported for a writer-independent handwriting recognition task.
4.
Without the use of a real n-gram language model the term β ^|w| has the same effect as a simple zero-gram model.
5.
A notable exception is the book by Huang and colleagues where a good overview over possible techniques is given [123, Chap. 13, pp. 645–662].

References

Aubert, X., Dugast, C., Ney, H., Steinbiss, V.: Large vocabulary continuous speech recognition of Wall Street Journal data. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Adelaide, vol. II, pp. 129–132 (1994)
Google Scholar
Billa, J., Ma, K., McDonough, J.W., Zavaliagkos, G., Miller, D.R., Ross, K.N., El-Jaroudi, A.: Multilingual speech recognition: the 1996 Byblos Callhome system. In: Proc. European Conf. on Speech Communication and Technology, Rhodes, Greece, vol. 1, pp. 363–366 (1997)
Google Scholar
Federico, M., Cettelo, M., Brugnara, F., Antoniol, G.: Language modelling for efficient beam-search. Comput. Speech Lang. 9, 353–379 (1995)
Article Google Scholar
Fink, G.A., Sagerer, G.: Zeitsynchrone Suche mit n-Gramm-Modellen höherer Ordnung (Time-synchonous search with higher-order n-gram models). In: Konvens 2000/Sprachkommunikation. ITG-Fachbericht, vol. 161, pp. 145–150. VDE Verlag, Berlin (2000) (in German)
Google Scholar
Fink, G.A., Schillo, C., Kummert, F., Sagerer, G.: Incremental speech recognition for multimodal interfaces. In: Proc. Annual Conference of the IEEE Industrial Electronics Society, Aachen, vol. 4, pp. 2012–2017 (1998)
Google Scholar
Gauvain, J.L., Lamel, L.F., Adda, G., Adda-Decker, M.: The LIMSI continuous speech dictation system: evaluation on the ARPA Wall Street Journal task. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Adelaide, vol. I, pp. 557–560 (1994)
Google Scholar
Hain, T., Woodland, P.C., Niesler, T.R., Whittaker, E.W.D.: The 1998 HTK system for transcription of conversational telephone speech. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Phoenix, AZ (1999)
Google Scholar
Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)
Google Scholar
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)
Google Scholar
Jelinek, F., Bahl, L.R., Mercer, R.L.: Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Trans. Inf. Theory 21(3), 250–256 (1975)
Article MATH Google Scholar
Jelinek, F., Mercer, R.L., Bahl, L.R.: Continuous speech recognition. In: Krishnaiah, P.R., Kanal, L.N. (eds.) Handbook of Statistics, vol. 2, pp. 549–573. North-Holland, Amsterdam (1982)
Google Scholar
Lowerre, B.T.: The HARPY speech recognition system. PhD thesis, Carnegie-Mellon University, Department of Computer Science, Pittsburgh (1976)
Google Scholar
Ney, H., Haeb-Umbach, R., Tran, B.H., Oerder, M.: Improvements in beam search for 10000-word continuous speech recognition. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, San Francisco, vol. 1, pp. 9–12 (1992)
Google Scholar
Ney, H., Ortmanns, S.: Dynamic programming search for continuous speech recognition. IEEE Signal Process. Mag. 16(5), 64–83 (1999)
Article Google Scholar
Ortmanns, S., Eiden, A., Ney, H., Coenen, N.: Look-ahead techniques for fast beam search. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, München, vol. 3, pp. 1783–1786 (1997)
Google Scholar
Ortmanns, S., Ney, H.: Look-ahead techniques for fast beam search. Comput. Speech Lang. 14, 15–32 (2000)
Article Google Scholar
Ortmanns, S., Ney, H.: The time-conditioned approach in dynamic programming search for LVCSR. IEEE Trans. Audio Speech Lang. Process. 8(6), 676–687 (2000)
Google Scholar
Ortmanns, S., Ney, H., Eiden, A.: Language-model look-ahead for large vocabulary speech recognition. In: Proc. Int. Conf. on Spoken Language Processing, Philadelphia, pp. 2095–2098 (1996)
Google Scholar
Ortmanns, S., Ney, H., Seide, F., Lindam, I.: A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition. In: Proc. Int. Conf. on Spoken Language Processing, Philadelphia, pp. 2091–2094 (1996)
Google Scholar
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Schwartz, R., Nguyen, L., Kubala, F., Chou, G., Zavaliagkos, G., Makhoul, J.: On using written language training data for spoken language modeling. In: Proc. Workshop on Human Language Technology, HLT ’94, pp. 94–98 (1994)
Chapter Google Scholar
Steinbiss, V., Tran, B.-H., Ney, H.: Improvements in beam search. In: Proc. Int. Conf. on Spoken Language Processing, Yokohama, Japan, vol. 4, pp. 2143–2146 (1994)
Google Scholar
Zimmermann, M., Bunke, H.: Optimizing the integration of a statistical language model in HMM based offline handwritten text recognition. In: Proc. Int. Conf. on Pattern Recognition, Cambridge, UK, vol. 2, pp. 541–544 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, TU Dortmund University, Dortmund, Germany
Gernot A. Fink

Authors

Gernot A. Fink
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fink, G.A. (2014). Integrated Search Methods. In: Markov Models for Pattern Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6308-4_12

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6308-4_12
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6307-7
Online ISBN: 978-1-4471-6308-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics