Skip to main content

Explicit Modelling of Duration in HMM: an Efficient Algorithm

  • Conference paper
Speech Recognition and Coding

Part of the book series: NATO ASI Series ((NATO ASI F,volume 147))

Abstract

Hidden Markov Modeling (HMM) techniques have been applied successfully to speech recognition problems. However, it has been claimed [1]-[5] that a major weakness of HMM is that the state duration probability density functions (SDPDF) are exponential, which is not appropriate for speech signals. In order to cope with this deficiency some authors have proposed to model explicitly the state duration. In these models the first order Markov hypothesis is broken in the loop transitions. Thus, the new models have been called Hidden Semi-Markov Models (HSMM). The first idea, up to the authors knowledge, is due to Fergurson [1] and consists in explicitly define a probability function per state, Pi, which controls the occupancy in each state. In his paper, Fergurson estimated Pi(d) from training data. One of the problems of this model is the large number of parameters per state (D, being D the maximum duration in any state). Those parameters have to be estimated in addition to those of the usual HMM. Therefore, an enormous database is required to accurately estimate the models. Fergurson himself suggested the possibility of using parametric functions for reducing the number of parameters, Levinson [3] extended the Baum Welch algorithm and provedits convergency. He also gave the details when the Gamma function is chosen as the PDF. Rusell and Moore [2] used the same result to recognize speech but by means of a Poisson function. Falachi [4] used a particular function chosen to increase the algorithm efficiency. Gu, Tseng and L. Lee [5] proposed the use of bounded functions (exponential functions lower and upper bounded) as a direct and simple (in training) but effective way of modelling the temporal structures existing in speech signals. In this paper an efficient algorithm to find the best state sequence in HSMM is presented. In next section we review the computational burden of these approximations and state a theorem which can effectively reduce their complexity. It is especially suitable to reduce complexity of HSMM as those proposed in [2, 3, 4].

This work has been supported by the grant TIC 92-1026-C02/02

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J.D. Fergurson, ā€œVariable Duration Models for Speechā€, Proc. Symposium, on the Application of HMM to Text and Speech, pp. 143ā€“179, Oct. 1986

    Google ScholarĀ 

  2. MJ. Russell and R.K. Moore, ā€œExplicit modelling of state occupancy in HMM for automatic speech recognition,ā€ ICASSPā€™85 (Tampa, FL), pp. 5ā€“8, Mar. 1985

    Google ScholarĀ 

  3. S.E. Levinson, ā€œContinuously variable duration HMM for automatic speech recognition,ā€ Computer, Speech and Language, vol 1, pp. 29ā€“45, Mar. 1986

    ArticleĀ  Google ScholarĀ 

  4. A. Falaschi, ā€œContinuously Variable Transition Probability HMM for Speech Recognition,ā€ in Speech Recognition and Understanding, Ed. Springer-Verlag Berlin Heidelberg, 1992, pp. 125ā€“130

    ChapterĀ  Google ScholarĀ 

  5. Hung-yan Gu, Chiu-yu Tseng and Lin-shan Lee, ā€œIsolated-Utterance Speech Recognition Using HMM with bounded State Duration,ā€ IEEE Trans, on Signal Processing, Vol. 39, No. 8, pp. 1743ā€“1751, Aug. 1991

    ArticleĀ  Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bonafonte, A., Ros, X., MariƱo, J.B. (1995). Explicit Modelling of Duration in HMM: an Efficient Algorithm. In: Ayuso, A.J.R., Soler, J.M.L. (eds) Speech Recognition and Coding. NATO ASI Series, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-57745-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-57745-1_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-63344-7

  • Online ISBN: 978-3-642-57745-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics