Learning Prosodic Patterns for Mandarin Speech Synthesis

Chen, Yiqiang; Gao, Wen; Zhu, Tingshao; Ling, Charles

doi:10.1023/A:1015568521453

Learning Prosodic Patterns for Mandarin Speech Synthesis

Published: July 2002

Volume 19, pages 95–109, (2002)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Yiqiang Chen¹,
Wen Gao¹,
Tingshao Zhu² &
…
Charles Ling³

90 Accesses
2 Citations
Explore all metrics

Abstract

Higher quality synthesized speech is required for widespread use of text-to-speech (TTS) technology, and the prosodic pattern is the key feature that makes synthetic speech sound unnatural and monotonous, which mainly describes the variation of pitch. The rules used in most Chinese TTS systems are constructed by experts, with weak quality control and low precision. In this paper, we propose a combination of clustering and machine learning techniques to extract prosodic patterns from actual large mandarin speech databases to improve the naturalness and intelligibility of synthesized speech. Typical prosody models are found by clustering analysis. Some machine learning techniques, including Rough Set, Artificial Neural Network (ANN) and Decision tree, are trained for fundamental frequency and energy contours, which can be directly used in a pitch-synchronous-overlap-add-based (PSOLA-based) TTS system. The experimental results showed that synthesized prosodic features greatly resembled their original counterparts for most syllables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bian, Zhaoqi and Zhang, Xuegong. (1999). Pattern Recognition. TsingHua University Publishing Company.
Cai Lianhong, Zhang Wei, and Hu Qiwei. (1998). Prosody Learning and Simulation for Chinese Text to Speech System. Journal of Tsinghua University, 38(S1), 92–95.
Google Scholar
Chen, J., Bell, D.A., and Liu, W. (1997). An Algorithm for Bayesian Belief Network Construction from Data. In Proceedings of AI and STAT'97, Florida (pp. 83–90).
Chen, S.-H., Huang, S.-H., and Wang, Y.-R. (1998). An RNN-Based Prosodic Information Synthesizer for Mandarin Text-to-Speech. IEEE Transaction on Speech and Audio Processing, 6(3), 226–239.
Google Scholar
Chu, M. (1995). Research on Chinese TTS System with High Intelligibility and Naturalness. Ph.D. Thesis, Institute of Acoustics, Academia Sinica.
Hu, C.-H. and Chen, J.-H. (1999). Template-Driven Generation of Prosodic Information for Chinese Concatenate synthesis. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 65–68.
Google Scholar
Lee, L.S., Tseng, C.Y., and Ouh-Young, M. (1989). The Synthesis Rules in a Chinese Text-to-Speech System. IEEE Trans. Acoust., Speech, Signal Processing, 37, 1309–1320.
Google Scholar
Lee, S. and Oh, Y.-H. (1999). Tree-Based Modeling of Prosodic Phrasing and Segmental Duration for Korean TTS System. Speech Communication, 28(4), 283–300.
Google Scholar
Pawlak, Z. (1999). Rough Classification. International Journal of Human-Computer Studies, 51(2), 369–383.
Google Scholar
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers Press.
Google Scholar
Rabiner, L. and Juang, B. (1999). Fundamentals of Speech Recognition. TsingHua University Publishing Company.
Ross, K.N. and Ostendorf, M. (1999). A Dynamical System Model for Generating Fundamental Frequency for Speech Synthesis. IEEE Transaction on Speech and Audio Processing, 7(3), 295–309.
Google Scholar
Russell, S., Binder, J., Koller, D., and Kanazawa, K. (1995). Local Learning in Probabilistic Networks with Hidden Variables. In Proc. 14th Joint Int. Conf. On Artificial Intelligence, Montreal, Vol. 2 (pp. 1146–1152).
Google Scholar
Suzuki, J. (1996). Learning Bayesian Belief Networks Based on the MDL Principle. In Proceedings of the International Conference on Machine Learning, Bari, Italy.
Walczak, B. and Massart, D.L. (1999). Rough Sets Theory. Chemometrics and Intelligent Laboratory Systems, 47(1), 1–16.
Google Scholar
Wang, Wei. (1995). Principle of Artificial Neural Network—Rudiment and Implement. Beijing University of Aeronautics and Astronautics Press.
Wu, C.H., Chen, C.H., and Juang, S.C. (1995). An CELP-Based Prosodic Information Modification and Generation of Mandarin Text-to-Speech. In Proc. ROCLING VIII (pp. 233–251).
Wu, Z. (1982). The Tone Variation in Mandarin. Chinese Grammar, 6, 439–449.
Google Scholar
Wu, Z. (1996). The Design of Prosodic Rule for Improving the Naturalness of the Marian TTS. The Research on Chinese Language and Words (pp. 355–365). Tsinghua University Press.

Download references

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, People's Republic of China, 100080
Yiqiang Chen & Wen Gao
Department of Computing Science, University of Alberta, Edmonton, Canada, T6G 2E1
Tingshao Zhu
Department of Computer Science, University of West Ontario, London, Ontario, Canada, N6A 5B7
Charles Ling

Authors

Yiqiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wen Gao
View author publications
You can also search for this author in PubMed Google Scholar
Tingshao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Charles Ling
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Gao, W., Zhu, T. et al. Learning Prosodic Patterns for Mandarin Speech Synthesis. Journal of Intelligent Information Systems 19, 95–109 (2002). https://doi.org/10.1023/A:1015568521453

Download citation

Issue Date: July 2002
DOI: https://doi.org/10.1023/A:1015568521453

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Prosodic Patterns for Mandarin Speech Synthesis

Abstract

Access this article

Similar content being viewed by others

A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model

Intelligent Speech Features Mining for Robust Synthesis System Evaluation

$$\hbox {F}_{0}$$ contour generation and synthesis using Bengali Hmm-based speech synthesis system

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Learning Prosodic Patterns for Mandarin Speech Synthesis

Abstract

Access this article

Similar content being viewed by others

A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model

Intelligent Speech Features Mining for Robust Synthesis System Evaluation

$$\hbox {F}_{0}$$ contour generation and synthesis using Bengali Hmm-based speech synthesis system

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation