Abstract
Tone plays an important lexical role in spoken tonal languages like Mandarin Chinese. In this paper we propose a two-pass search strategy for improving tonal syllable recognition performance. In the first pass, instantaneous F0 information is employed along with corresponding cepstral information in a 2-stream HMM based decoding. The F0 stream, which incorporates both discrete voiced/unvoiced information and continuous F0 contour, is modeled with a multi-space distribution. With just the first-pass decoding, we recently reported a relative improvement of 24% reduction of tonal syllable recognition errors on a Mandarin Chinese database [5]. In the second pass, F0 information over a horizontal, longer time span is used to build explicit tone models for rescoring the lattice generated in the first pass. Experimental results on the same Mandarin database show that an additional 8% relative error reduction of tonal syllable recognition is obtained by the second-pass search, lattice rescoring with enhanced tone models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hirst, D., Espesser, R.: Automatic Modeling of Fundamental Frequency Using a Quadratic Spline Function. Travaux de l’Institut de Phonétique d’Aix 15, 71–85 (1993)
Chen, C.J., Gopinath, R.A., Monkowski, M.D., Picheny, M.A., Shen, K.: New Methods in Continuous Mandarin Speech Recognition. In: Proc. Eurospeech 1997, pp. 1543–1546 (1997)
Chang, E., Zhou, J.-L., Di, S., Huang, C., Lee, K.-F.: Large Vocabulary Mandarin Speech Recognition with Different Approach in Modeling Tones. In: Proc. ICSLP 2000, pp. 983–986 (2000)
Freij, G.J., Fallside, F.: Lexical Stress Recognition Using Hidden Markov Models. In: Proc. ICASSP 1988, pp. 135–138 (1988)
Wang, H.L., Qian, Y., Soong, F.K., Zhou, J.-L., Han, J.Q.: A Multi-Space Distribution (MSD) Approach to Speech Recognition of Tonal Languages. In: Proc. ICSLP 2006 (2006)
Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space Probability Distribution HMM. IEICE Trans. Inf. & Syst. E85-D(3), 455–464 (2002)
Lin, C.H., Wu, C.H., Ting, P.Y., Wang, H.M.: Framework for Recognition of Mandarin Syllables with Tones Using Sub-syllabic Units. Journal of Speech Communication 18(2), 175–190 (1996)
Qian, Y., Soong, F.K., Lee, T.: Tone-enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR. In: Proc. ICASSP 2006, pp. 133–136 (2006)
Tian, Y., Zhou, J.-L., Chu, M., Chang, E.: Tone Recognition with Fractionized Models and Outlined Features. In: Proc. ICASSP 2004, pp. 105–108 (2004)
Qian, Y.: Use of Tone Information in Cantonese LVCSR Based on Generalized Character Posterior Probability Decoding. PhD. Thesis, CUHK (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, H., Qian, Y., Soong, F., Zhou, JL., Han, J. (2006). Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_47
Download citation
DOI: https://doi.org/10.1007/11939993_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)