Hidden Markov Network for Precise and Robust Acoustic Modeling

Sagayama, Shigeki

doi:10.1007/978-1-4613-1367-0_7

Shigeki Sagayama³

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 355))

432 Accesses

Abstract

This chapter discusses the structure of acoustic models and training algorithms for speech recognition. As is generally recognized, high acoustic model complexity demands more training data. One effective solution is tying at multiple levels such as allophone, state, distribution, or parameter levels. Tied structures such as generalized triphones, state tying, and tied mixtures, have been one of the main streams of research in acoustic modeling of speech. They offer not only precise and robust modeling, but also significant computational advantage. This chapter introduces the Hidden Markov Network (HMnet) which is derived by the Successive State Splitting algorithm. The ultimate goal is an acoustic model with a fully tied acoustic structure in four levels. Vector Field Smoothing (VFS) for speaker adaptation is also discussed for more efficient training of acoustic models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Schwartz, Y-L. Chow, O. Kimball, S. Roucos, M. Krasner, and J. Makhoul: “Context-Dependent Modeling for Acoustic-Phonetic of Continuous Speech,” Proc. ICASSP85, pp. 1205–1208, 1985.
Google Scholar
S. Sagayama: “Phoneme Environment Clustering,” Proc. ASJ Conf., 1–5–15, pp. 29–30, Oct 1987. (in Japanese)
Google Scholar
S. Sagayama: “Phoneme Environment Clustering for Speech Recognition,” Proc. ICASSP89, pp. 397–400, 1989.
Google Scholar
J. Bellegarda and D. Nahamoo: “Tied Mixture Continuous Parameter Models for Large Vocabulary Isolated Speech Recognition,” Proc. ICASSP89, pp. 13–16, 1989.
Google Scholar
K-F. Lee: “Context-Dependent Phonetic Hidden Markov Models for Speaker Independent Continuous Speech Recognition,” IEEE Trans. ASSP, Vol 38, No 4, pp. 599–609, 1990.
Article Google Scholar
K-F. Lee, S. Hayamizu, H-W. Hon, C. Huang, J. Swartz, and R. Weide: “Allophone Clustering for Continuous Speech Recognition,” Proc. ICASSP90, pp. 749–752, 1990.
Google Scholar
X-D. Huang, K-F. Lee, H-W. Hon, and M-Y. Hwang: “Improved Acoustic Modeling with the SPHINX Speech Recognition System,” Proc. ICASSP91, pp.345–348, 1991.
Google Scholar
C-H. Lee, C-H. Lin and B-H. Juang: “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models,” IEEE Trans. SP, Vol. 39, pp. 806–814, Apr 1991.
Article Google Scholar
S. Euler and J. Zinke: “Extending the Vocabulary of a Speaker Independent Recognition System,” Proc. ICASSP91, pp. 301–304, 1991.
Google Scholar
D. B. Paul: “The Lincoln Tied-Mixture HMM Continuous Speech Recognizer,” Proc. ICASSP91, pp. 329–332, 1991.
Google Scholar
X-D. Huang: “Phoneme Classification Using Semicontinuous Hidden Markov Models,” IEEE Trans. ASSP, Vol 40, No 5, pp. 1062–1067, 1992.
Google Scholar
J. Takami and S. Sagayama: “A Successive State Splitting Algorithm for Efficient Allophone Modeling,” Proc. ICASSP92, pp. I-573–576, 1992.
Google Scholar
J. Takami and S. Sagayama: “Automatic Generation of Hidden Markov Networks by a Successive State Splitting Algorithm,” IEICE Trans., Vol. J76-D-II, No. 10, pp. 2155–2164, Oct 1993. (in Japanese)
Google Scholar
S. J. Young: “The General Use of Tying in Phoneme-based HMM Speech Recognizers,” Proc. ICASSP92, pp. I-569–572, 1992.
Google Scholar
K. Ohkura, M. Sugiyama and S. Sagayama: “Speaker Adaptation Based on Transfer Vector Field Smoothing with Continuous Mixture Density HMMs,” Proc. ICSLP92, pp. 369–372, Oct 1992.
Google Scholar
H. Hattori and S. Sagayama: “Vector Field Smoothing Principle for Speaker Adaptation,” Proc. ICSLP92, pp. 381–384, Oct 1992.
Google Scholar
T. Kosaka, J. Takami and S. Sagayama: “Rapid Speaker Adaptation Using Speaker-Mixture Allophone Models Applied to Speaker-Independent Speech Recognition,” Proc. ICASSP93, pp. II-570–573.
Google Scholar
S. J. Young and P. C. Woodland: “The Use of State Tying in Continuous Speech Recognition,” Proc. Eurospeech93, pp. 2203–2206, 1993.
Google Scholar
M-Y. Hwang and X-D. Huang: “Shared-Distribution Hidden Markov Models for Speech Recognition,” IEEE Trans. ASSP, Vol 1, No 4, pp. 414–420, 1993.
Google Scholar
S. Takahashi and S. Sagayama: “Four-level Tied Structure for Efficient Representation of Acoustic Modeling,” Proc. ICASSP95, pp. 520–523, 1995.
Google Scholar
S. Sagayama and S. Takahashi: “On the Use of Scalar Quantization for Fast HMM Computation,” Proc. ICASSP95, pp. 213–216, May 1995.
Google Scholar
J. Takahashi and S. Sagayama: “Vector-Field-Smoothed Bayesian Learning for Incremental Speaker Adaptation,” Proc. ICASSP95, pp. 696–699, May 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

NTT Human Interface Laboratories, Yokosuka, 238-03, Japan
Shigeki Sagayama

Authors

Shigeki Sagayama
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AT&T Bell Laboratories, Murray Hill, NJ, 07974, USA
Chin-Hui Lee & Frank K. Soong &
School of Microelectronic Engineering, Griffith University, Australia
Kuldip K. Paliwal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sagayama, S. (1996). Hidden Markov Network for Precise and Robust Acoustic Modeling. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_7

Download citation

DOI: https://doi.org/10.1007/978-1-4613-1367-0_7
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8590-8
Online ISBN: 978-1-4613-1367-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics