Explicit Modelling of Duration in HMM: an Efficient Algorithm

Bonafonte, A.; Ros, X.; Mariño, J. B.

doi:10.1007/978-3-642-57745-1_11

A. Bonafonte,
X. Ros &
J. B. Mariño²

Part of the book series: NATO ASI Series ((NATO ASI F,volume 147))

231 Accesses
1 Citations

Abstract

Hidden Markov Modeling (HMM) techniques have been applied successfully to speech recognition problems. However, it has been claimed [1]-[5] that a major weakness of HMM is that the state duration probability density functions (SDPDF) are exponential, which is not appropriate for speech signals. In order to cope with this deficiency some authors have proposed to model explicitly the state duration. In these models the first order Markov hypothesis is broken in the loop transitions. Thus, the new models have been called Hidden Semi-Markov Models (HSMM). The first idea, up to the authors knowledge, is due to Fergurson [1] and consists in explicitly define a probability function per state, Pi, which controls the occupancy in each state. In his paper, Fergurson estimated Pi(d) from training data. One of the problems of this model is the large number of parameters per state (D, being D the maximum duration in any state). Those parameters have to be estimated in addition to those of the usual HMM. Therefore, an enormous database is required to accurately estimate the models. Fergurson himself suggested the possibility of using parametric functions for reducing the number of parameters, Levinson [3] extended the Baum Welch algorithm and provedits convergency. He also gave the details when the Gamma function is chosen as the PDF. Rusell and Moore [2] used the same result to recognize speech but by means of a Poisson function. Falachi [4] used a particular function chosen to increase the algorithm efficiency. Gu, Tseng and L. Lee [5] proposed the use of bounded functions (exponential functions lower and upper bounded) as a direct and simple (in training) but effective way of modelling the temporal structures existing in speech signals. In this paper an efficient algorithm to find the best state sequence in HSMM is presented. In next section we review the computational burden of these approximations and state a theorem which can effectively reduce their complexity. It is especially suitable to reduce complexity of HSMM as those proposed in [2, 3, 4].

This work has been supported by the grant TIC 92-1026-C02/02

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J.D. Fergurson, “Variable Duration Models for Speech”, Proc. Symposium, on the Application of HMM to Text and Speech, pp. 143–179, Oct. 1986
Google Scholar
MJ. Russell and R.K. Moore, “Explicit modelling of state occupancy in HMM for automatic speech recognition,” ICASSP’85 (Tampa, FL), pp. 5–8, Mar. 1985
Google Scholar
S.E. Levinson, “Continuously variable duration HMM for automatic speech recognition,” Computer, Speech and Language, vol 1, pp. 29–45, Mar. 1986
Article Google Scholar
A. Falaschi, “Continuously Variable Transition Probability HMM for Speech Recognition,” in Speech Recognition and Understanding, Ed. Springer-Verlag Berlin Heidelberg, 1992, pp. 125–130
Chapter Google Scholar
Hung-yan Gu, Chiu-yu Tseng and Lin-shan Lee, “Isolated-Utterance Speech Recognition Using HMM with bounded State Duration,” IEEE Trans, on Signal Processing, Vol. 39, No. 8, pp. 1743–1751, Aug. 1991
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Signal Theory and Communications (U.P.C.), Apdo. 30002, Barcelona, 08080, Spain
J. B. Mariño

Authors

A. Bonafonte
View author publications
You can also search for this author in PubMed Google Scholar
X. Ros
View author publications
You can also search for this author in PubMed Google Scholar
J. B. Mariño
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronics and Technology of Computers Faculty of Sciences, University of Granada, E-18071, Granada, Spain
Antonio J. Rubio Ayuso & Juan M. López Soler &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bonafonte, A., Ros, X., Mariño, J.B. (1995). Explicit Modelling of Duration in HMM: an Efficient Algorithm. In: Ayuso, A.J.R., Soler, J.M.L. (eds) Speech Recognition and Coding. NATO ASI Series, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-57745-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-57745-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-63344-7
Online ISBN: 978-3-642-57745-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics