Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models

Wang, Huanliang; Qian, Yao; Soong, Frank; Zhou, Jian-Lai; Han, Jiqing

doi:10.1007/11939993_47

Huanliang Wang²²,
Yao Qian²³,
Frank Soong²³,
Jian-Lai Zhou²³ &
…
Jiqing Han²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

International Symposium on Chinese Spoken Language Processing

1581 Accesses
5 Citations
3 Altmetric

Abstract

Tone plays an important lexical role in spoken tonal languages like Mandarin Chinese. In this paper we propose a two-pass search strategy for improving tonal syllable recognition performance. In the first pass, instantaneous F0 information is employed along with corresponding cepstral information in a 2-stream HMM based decoding. The F0 stream, which incorporates both discrete voiced/unvoiced information and continuous F0 contour, is modeled with a multi-space distribution. With just the first-pass decoding, we recently reported a relative improvement of 24% reduction of tonal syllable recognition errors on a Mandarin Chinese database [5]. In the second pass, F0 information over a horizontal, longer time span is used to build explicit tone models for rescoring the lattice generated in the first pass. Experimental results on the same Mandarin database show that an additional 8% relative error reduction of tonal syllable recognition is obtained by the second-pass search, lattice rescoring with enhanced tone models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hirst, D., Espesser, R.: Automatic Modeling of Fundamental Frequency Using a Quadratic Spline Function. Travaux de l’Institut de Phonétique d’Aix 15, 71–85 (1993)
Google Scholar
Chen, C.J., Gopinath, R.A., Monkowski, M.D., Picheny, M.A., Shen, K.: New Methods in Continuous Mandarin Speech Recognition. In: Proc. Eurospeech 1997, pp. 1543–1546 (1997)
Google Scholar
Chang, E., Zhou, J.-L., Di, S., Huang, C., Lee, K.-F.: Large Vocabulary Mandarin Speech Recognition with Different Approach in Modeling Tones. In: Proc. ICSLP 2000, pp. 983–986 (2000)
Google Scholar
Freij, G.J., Fallside, F.: Lexical Stress Recognition Using Hidden Markov Models. In: Proc. ICASSP 1988, pp. 135–138 (1988)
Google Scholar
Wang, H.L., Qian, Y., Soong, F.K., Zhou, J.-L., Han, J.Q.: A Multi-Space Distribution (MSD) Approach to Speech Recognition of Tonal Languages. In: Proc. ICSLP 2006 (2006)
Google Scholar
Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space Probability Distribution HMM. IEICE Trans. Inf. & Syst. E85-D(3), 455–464 (2002)
Google Scholar
Lin, C.H., Wu, C.H., Ting, P.Y., Wang, H.M.: Framework for Recognition of Mandarin Syllables with Tones Using Sub-syllabic Units. Journal of Speech Communication 18(2), 175–190 (1996)
Article Google Scholar
Qian, Y., Soong, F.K., Lee, T.: Tone-enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR. In: Proc. ICASSP 2006, pp. 133–136 (2006)
Google Scholar
Tian, Y., Zhou, J.-L., Chu, M., Chang, E.: Tone Recognition with Fractionized Models and Outlined Features. In: Proc. ICASSP 2004, pp. 105–108 (2004)
Google Scholar
Qian, Y.: Use of Tone Information in Cantonese LVCSR Based on Generalized Character Posterior Probability Decoding. PhD. Thesis, CUHK (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Harbin Institute of Technology,
Huanliang Wang & Jiqing Han
Microsoft Research Asia, Beijing
Yao Qian, Frank Soong & Jian-Lai Zhou

Authors

Huanliang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yao Qian
View author publications
You can also search for this author in PubMed Google Scholar
Frank Soong
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Lai Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jiqing Han
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Qian, Y., Soong, F., Zhou, JL., Han, J. (2006). Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_47

Download citation

DOI: https://doi.org/10.1007/11939993_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics