Abstract
Despite the great importance of integrated search methods for the application of Markov models to real-world problems, their presentation is frequently neglected in the literature. For current systems for speech or handwriting recognition, it is often difficult to find out from the respective publications which algorithmic solutions were chosen for the problem of combining HMMs and n-gram models.
Therefore, we will present the most important methods for the integration of HMMs and n-gram models in this chapter. At the beginning the probably oldest and also simplest technique is described where a model network is created on the basis of partial models in which the n-gram scores serve as transition probabilities. In multi-pass search methods, long-span n-gram restrictions are applied only in a second search process in order to reduce the total search effort. Integrated time-synchronous decoding procedures are usually based on the efficient representation of the recognition lexicon as a prefix tree. In such configurations it is necessary to create copies of the resulting search space in order to be able to integrate the n-gram score directly into the search. The chapter concludes with a presentation of a flexible technique for the integrated time-synchronous search in combined HMM/n-gram models which is also capable of handling long-span context restrictions efficiently.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This is mainly due to the fact that probabilities on largely differing time-scales enter into the calculations for P(w) and P(X|w), respectively. For determining the HMM score, state transition and output probabilities accumulate per time step, i.e., with the “clock pulse” of the signal. The score of the word sequence, in contrast, is generated by multiplying one conditional probability per word and, therefore, comprises one to two orders of magnitude fewer probability components.
- 2.
Frequently, the reason for a weighted combination of n-gram and HMM scores being necessary is said to be that the models were generated on different data and, therefore, would not be completely compatible. However, this is at most of marginal importance as, e.g., experiments within the German Verbmobil project showed clearly. For the extensive evaluation in the year 1996 the training data for HMMs and language model was identical. A weighting of the scores was still necessary, though.
- 3.
In [331] an evaluation of the meta-parameters for combining HMMs and n-gram models is reported for a writer-independent handwriting recognition task.
- 4.
Without the use of a real n-gram language model the term β |w| has the same effect as a simple zero-gram model.
- 5.
A notable exception is the book by Huang and colleagues where a good overview over possible techniques is given [123, Chap. 13, pp. 645–662].
References
Aubert, X., Dugast, C., Ney, H., Steinbiss, V.: Large vocabulary continuous speech recognition of Wall Street Journal data. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Adelaide, vol. II, pp. 129–132 (1994)
Billa, J., Ma, K., McDonough, J.W., Zavaliagkos, G., Miller, D.R., Ross, K.N., El-Jaroudi, A.: Multilingual speech recognition: the 1996 Byblos Callhome system. In: Proc. European Conf. on Speech Communication and Technology, Rhodes, Greece, vol. 1, pp. 363–366 (1997)
Federico, M., Cettelo, M., Brugnara, F., Antoniol, G.: Language modelling for efficient beam-search. Comput. Speech Lang. 9, 353–379 (1995)
Fink, G.A., Sagerer, G.: Zeitsynchrone Suche mit n-Gramm-Modellen höherer Ordnung (Time-synchonous search with higher-order n-gram models). In: Konvens 2000/Sprachkommunikation. ITG-Fachbericht, vol. 161, pp. 145–150. VDE Verlag, Berlin (2000) (in German)
Fink, G.A., Schillo, C., Kummert, F., Sagerer, G.: Incremental speech recognition for multimodal interfaces. In: Proc. Annual Conference of the IEEE Industrial Electronics Society, Aachen, vol. 4, pp. 2012–2017 (1998)
Gauvain, J.L., Lamel, L.F., Adda, G., Adda-Decker, M.: The LIMSI continuous speech dictation system: evaluation on the ARPA Wall Street Journal task. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Adelaide, vol. I, pp. 557–560 (1994)
Hain, T., Woodland, P.C., Niesler, T.R., Whittaker, E.W.D.: The 1998 HTK system for transcription of conversational telephone speech. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Phoenix, AZ (1999)
Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)
Jelinek, F., Bahl, L.R., Mercer, R.L.: Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Trans. Inf. Theory 21(3), 250–256 (1975)
Jelinek, F., Mercer, R.L., Bahl, L.R.: Continuous speech recognition. In: Krishnaiah, P.R., Kanal, L.N. (eds.) Handbook of Statistics, vol. 2, pp. 549–573. North-Holland, Amsterdam (1982)
Lowerre, B.T.: The HARPY speech recognition system. PhD thesis, Carnegie-Mellon University, Department of Computer Science, Pittsburgh (1976)
Ney, H., Haeb-Umbach, R., Tran, B.H., Oerder, M.: Improvements in beam search for 10000-word continuous speech recognition. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, San Francisco, vol. 1, pp. 9–12 (1992)
Ney, H., Ortmanns, S.: Dynamic programming search for continuous speech recognition. IEEE Signal Process. Mag. 16(5), 64–83 (1999)
Ortmanns, S., Eiden, A., Ney, H., Coenen, N.: Look-ahead techniques for fast beam search. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, München, vol. 3, pp. 1783–1786 (1997)
Ortmanns, S., Ney, H.: Look-ahead techniques for fast beam search. Comput. Speech Lang. 14, 15–32 (2000)
Ortmanns, S., Ney, H.: The time-conditioned approach in dynamic programming search for LVCSR. IEEE Trans. Audio Speech Lang. Process. 8(6), 676–687 (2000)
Ortmanns, S., Ney, H., Eiden, A.: Language-model look-ahead for large vocabulary speech recognition. In: Proc. Int. Conf. on Spoken Language Processing, Philadelphia, pp. 2095–2098 (1996)
Ortmanns, S., Ney, H., Seide, F., Lindam, I.: A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition. In: Proc. Int. Conf. on Spoken Language Processing, Philadelphia, pp. 2091–2094 (1996)
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Schwartz, R., Nguyen, L., Kubala, F., Chou, G., Zavaliagkos, G., Makhoul, J.: On using written language training data for spoken language modeling. In: Proc. Workshop on Human Language Technology, HLT ’94, pp. 94–98 (1994)
Steinbiss, V., Tran, B.-H., Ney, H.: Improvements in beam search. In: Proc. Int. Conf. on Spoken Language Processing, Yokohama, Japan, vol. 4, pp. 2143–2146 (1994)
Zimmermann, M., Bunke, H.: Optimizing the integration of a statistical language model in HMM based offline handwritten text recognition. In: Proc. Int. Conf. on Pattern Recognition, Cambridge, UK, vol. 2, pp. 541–544 (2004)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag London
About this chapter
Cite this chapter
Fink, G.A. (2014). Integrated Search Methods. In: Markov Models for Pattern Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6308-4_12
Download citation
DOI: https://doi.org/10.1007/978-1-4471-6308-4_12
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6307-7
Online ISBN: 978-1-4471-6308-4
eBook Packages: Computer ScienceComputer Science (R0)