Abstract
This chapter discusses the structure of acoustic models and training algorithms for speech recognition. As is generally recognized, high acoustic model complexity demands more training data. One effective solution is tying at multiple levels such as allophone, state, distribution, or parameter levels. Tied structures such as generalized triphones, state tying, and tied mixtures, have been one of the main streams of research in acoustic modeling of speech. They offer not only precise and robust modeling, but also significant computational advantage. This chapter introduces the Hidden Markov Network (HMnet) which is derived by the Successive State Splitting algorithm. The ultimate goal is an acoustic model with a fully tied acoustic structure in four levels. Vector Field Smoothing (VFS) for speaker adaptation is also discussed for more efficient training of acoustic models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Schwartz, Y-L. Chow, O. Kimball, S. Roucos, M. Krasner, and J. Makhoul: “Context-Dependent Modeling for Acoustic-Phonetic of Continuous Speech,” Proc. ICASSP85, pp. 1205–1208, 1985.
S. Sagayama: “Phoneme Environment Clustering,” Proc. ASJ Conf., 1–5–15, pp. 29–30, Oct 1987. (in Japanese)
S. Sagayama: “Phoneme Environment Clustering for Speech Recognition,” Proc. ICASSP89, pp. 397–400, 1989.
J. Bellegarda and D. Nahamoo: “Tied Mixture Continuous Parameter Models for Large Vocabulary Isolated Speech Recognition,” Proc. ICASSP89, pp. 13–16, 1989.
K-F. Lee: “Context-Dependent Phonetic Hidden Markov Models for Speaker Independent Continuous Speech Recognition,” IEEE Trans. ASSP, Vol 38, No 4, pp. 599–609, 1990.
K-F. Lee, S. Hayamizu, H-W. Hon, C. Huang, J. Swartz, and R. Weide: “Allophone Clustering for Continuous Speech Recognition,” Proc. ICASSP90, pp. 749–752, 1990.
X-D. Huang, K-F. Lee, H-W. Hon, and M-Y. Hwang: “Improved Acoustic Modeling with the SPHINX Speech Recognition System,” Proc. ICASSP91, pp.345–348, 1991.
C-H. Lee, C-H. Lin and B-H. Juang: “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models,” IEEE Trans. SP, Vol. 39, pp. 806–814, Apr 1991.
S. Euler and J. Zinke: “Extending the Vocabulary of a Speaker Independent Recognition System,” Proc. ICASSP91, pp. 301–304, 1991.
D. B. Paul: “The Lincoln Tied-Mixture HMM Continuous Speech Recognizer,” Proc. ICASSP91, pp. 329–332, 1991.
X-D. Huang: “Phoneme Classification Using Semicontinuous Hidden Markov Models,” IEEE Trans. ASSP, Vol 40, No 5, pp. 1062–1067, 1992.
J. Takami and S. Sagayama: “A Successive State Splitting Algorithm for Efficient Allophone Modeling,” Proc. ICASSP92, pp. I-573–576, 1992.
J. Takami and S. Sagayama: “Automatic Generation of Hidden Markov Networks by a Successive State Splitting Algorithm,” IEICE Trans., Vol. J76-D-II, No. 10, pp. 2155–2164, Oct 1993. (in Japanese)
S. J. Young: “The General Use of Tying in Phoneme-based HMM Speech Recognizers,” Proc. ICASSP92, pp. I-569–572, 1992.
K. Ohkura, M. Sugiyama and S. Sagayama: “Speaker Adaptation Based on Transfer Vector Field Smoothing with Continuous Mixture Density HMMs,” Proc. ICSLP92, pp. 369–372, Oct 1992.
H. Hattori and S. Sagayama: “Vector Field Smoothing Principle for Speaker Adaptation,” Proc. ICSLP92, pp. 381–384, Oct 1992.
T. Kosaka, J. Takami and S. Sagayama: “Rapid Speaker Adaptation Using Speaker-Mixture Allophone Models Applied to Speaker-Independent Speech Recognition,” Proc. ICASSP93, pp. II-570–573.
S. J. Young and P. C. Woodland: “The Use of State Tying in Continuous Speech Recognition,” Proc. Eurospeech93, pp. 2203–2206, 1993.
M-Y. Hwang and X-D. Huang: “Shared-Distribution Hidden Markov Models for Speech Recognition,” IEEE Trans. ASSP, Vol 1, No 4, pp. 414–420, 1993.
S. Takahashi and S. Sagayama: “Four-level Tied Structure for Efficient Representation of Acoustic Modeling,” Proc. ICASSP95, pp. 520–523, 1995.
S. Sagayama and S. Takahashi: “On the Use of Scalar Quantization for Fast HMM Computation,” Proc. ICASSP95, pp. 213–216, May 1995.
J. Takahashi and S. Sagayama: “Vector-Field-Smoothed Bayesian Learning for Incremental Speaker Adaptation,” Proc. ICASSP95, pp. 696–699, May 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Kluwer Academic Publishers
About this chapter
Cite this chapter
Sagayama, S. (1996). Hidden Markov Network for Precise and Robust Acoustic Modeling. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_7
Download citation
DOI: https://doi.org/10.1007/978-1-4613-1367-0_7
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8590-8
Online ISBN: 978-1-4613-1367-0
eBook Packages: Springer Book Archive