Speech Synthesis Based on Gaussian Conditional Random Fields

Khorram, Soheil; Bahmaninezhad, Fahimeh; Sameti, Hossein

doi:10.1007/978-3-319-10849-0_19

Soheil Khorram⁴,
Fahimeh Bahmaninezhad⁴ &
Hossein Sameti⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 427))

Included in the following conference series:

International Symposium on Artificial Intelligence and Signal Processing

977 Accesses
2 Citations

Abstract

Hidden Markov Model (HMM)-based synthesis (HTS) has recently been confirmed to be the most effective method in generating natural speech. However, it lacks adequate context generalization when the training data is limited. As a solution, current study provides a new context-dependent speech modeling framework based on the Gaussian Conditional Random Field (GCRF) theory. By applying this model, an innovative speech synthesis system has been developed which can be viewed as an extension of Context-Dependent Hidden Semi Markov Model (CD-HSMM). A novel Viterbi decoder along with a stochastic gradient ascent algorithm was applied to train model parameters. Also, a fast and efficient parameter generation algorithm was derived for the synthesis part. Experimental results using objective and subjective criteria have shown that the proposed system outperforms HSMM substantially in limited speech databases. Moreover, Mel-cepstral distance of the spectral parameters has been reduced considerably for any size of training database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Black, A.W., Zen, H., Tokuda, K.: Statistical Parametric Speech Synthesis. In: ICASSP’2007, Honolulu, Hawai’i, USA, pp. IV-1229–IV-1232 (2007)
Google Scholar
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. Elsevier 51(11), 1039–1064 (2009)
Article Google Scholar
Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Hidden semi-markov model based speech synthesis. In: Interspeech’2004, Jeju Island, Korea, pp. 1393–1396, October 4–8 2004
Google Scholar
Zen, H., Tokuda, K., Kitamura, T.: An introduction of trajectory model into hmm-based speech synthesis. In: SSW5, pp. 191–196. Carnegie Mellon University, June 2004
Google Scholar
Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space probability distribution HMM. IEICE Trans. Inf. Syst. E85-D(3), 455–464 (2002)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar
Grimmett, G.R.: A theorem about random fields. Bull. Lond. Math. Soc. 5, 81–84 (1973)
Article MathSciNet MATH Google Scholar
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Getoor, L., Taskar, B. (eds.) Introduction to statistical Relational Learning. MIT Press, Cambridge (2006)
Google Scholar
Gardner, W.A.: Learning characteristics of stochastic-gradient-descent algorithms: a general study, analysis and critique. Sig. Process. 6(2), 113–133 (1984)
Article Google Scholar
Vrahatis, M.N., Androulakis, G.S., Lambrinos, J.N., Magoulas, G.D.: A class of gradient unconstrained minimization algorithms with adaptive stepsize. J. Comput. Appl. Math. 114(2), 367–386 (2000)
Article MathSciNet MATH Google Scholar
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech parameter generation algorithms for HMM-based speech synthesis. In: ICASSP’2000, vol. 3, Istanbul, pp. 1315–1318, June 2000
Google Scholar
Bijankhan, M., Sheikhzadegan, J., Roohani, M.R., Samareh, Y., Lucas, C., Tebiani, M.: The speech database of farsi spoken language. In: Proceedings of 5th Australian International Conference on Speech Science and Technology (SST’94), pp. 826–831 (1994)
Google Scholar
Kawahara, H., Masuda-Katsuse, I., de Cheveigné, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3–4), 187–207 (1999)
Article Google Scholar
Kubichek, R.F.: Mel-cepstral distance measure for objective speech quality assessment. In: Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, pp. 125–128 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Soheil Khorram, Fahimeh Bahmaninezhad & Hossein Sameti

Authors

Soheil Khorram
View author publications
You can also search for this author in PubMed Google Scholar
Fahimeh Bahmaninezhad
View author publications
You can also search for this author in PubMed Google Scholar
Hossein Sameti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soheil Khorram .

Editor information

Editors and Affiliations

Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Ali Movaghar
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Mansour Jamzad
Sharif University of Technology, Tehran, Iran
Hossein Asadi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khorram, S., Bahmaninezhad, F., Sameti, H. (2014). Speech Synthesis Based on Gaussian Conditional Random Fields. In: Movaghar, A., Jamzad, M., Asadi, H. (eds) Artificial Intelligence and Signal Processing. AISP 2013. Communications in Computer and Information Science, vol 427. Springer, Cham. https://doi.org/10.1007/978-3-319-10849-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-10849-0_19
Published: 26 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10848-3
Online ISBN: 978-3-319-10849-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics