Abstract
Hidden Markov Model (HMM)-based synthesis (HTS) has recently been confirmed to be the most effective method in generating natural speech. However, it lacks adequate context generalization when the training data is limited. As a solution, current study provides a new context-dependent speech modeling framework based on the Gaussian Conditional Random Field (GCRF) theory. By applying this model, an innovative speech synthesis system has been developed which can be viewed as an extension of Context-Dependent Hidden Semi Markov Model (CD-HSMM). A novel Viterbi decoder along with a stochastic gradient ascent algorithm was applied to train model parameters. Also, a fast and efficient parameter generation algorithm was derived for the synthesis part. Experimental results using objective and subjective criteria have shown that the proposed system outperforms HSMM substantially in limited speech databases. Moreover, Mel-cepstral distance of the spectral parameters has been reduced considerably for any size of training database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Black, A.W., Zen, H., Tokuda, K.: Statistical Parametric Speech Synthesis. In: ICASSP’2007, Honolulu, Hawai’i, USA, pp. IV-1229–IV-1232 (2007)
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. Elsevier 51(11), 1039–1064 (2009)
Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Hidden semi-markov model based speech synthesis. In: Interspeech’2004, Jeju Island, Korea, pp. 1393–1396, October 4–8 2004
Zen, H., Tokuda, K., Kitamura, T.: An introduction of trajectory model into hmm-based speech synthesis. In: SSW5, pp. 191–196. Carnegie Mellon University, June 2004
Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space probability distribution HMM. IEICE Trans. Inf. Syst. E85-D(3), 455–464 (2002)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
Grimmett, G.R.: A theorem about random fields. Bull. Lond. Math. Soc. 5, 81–84 (1973)
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Getoor, L., Taskar, B. (eds.) Introduction to statistical Relational Learning. MIT Press, Cambridge (2006)
Gardner, W.A.: Learning characteristics of stochastic-gradient-descent algorithms: a general study, analysis and critique. Sig. Process. 6(2), 113–133 (1984)
Vrahatis, M.N., Androulakis, G.S., Lambrinos, J.N., Magoulas, G.D.: A class of gradient unconstrained minimization algorithms with adaptive stepsize. J. Comput. Appl. Math. 114(2), 367–386 (2000)
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech parameter generation algorithms for HMM-based speech synthesis. In: ICASSP’2000, vol. 3, Istanbul, pp. 1315–1318, June 2000
Bijankhan, M., Sheikhzadegan, J., Roohani, M.R., Samareh, Y., Lucas, C., Tebiani, M.: The speech database of farsi spoken language. In: Proceedings of 5th Australian International Conference on Speech Science and Technology (SST’94), pp. 826–831 (1994)
Kawahara, H., Masuda-Katsuse, I., de Cheveigné, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3–4), 187–207 (1999)
Kubichek, R.F.: Mel-cepstral distance measure for objective speech quality assessment. In: Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, pp. 125–128 (1993)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Khorram, S., Bahmaninezhad, F., Sameti, H. (2014). Speech Synthesis Based on Gaussian Conditional Random Fields. In: Movaghar, A., Jamzad, M., Asadi, H. (eds) Artificial Intelligence and Signal Processing. AISP 2013. Communications in Computer and Information Science, vol 427. Springer, Cham. https://doi.org/10.1007/978-3-319-10849-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-10849-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10848-3
Online ISBN: 978-3-319-10849-0
eBook Packages: Computer ScienceComputer Science (R0)