Skip to main content

Speech Synthesis Based on Gaussian Conditional Random Fields

  • Conference paper
  • First Online:
Artificial Intelligence and Signal Processing (AISP 2013)

Abstract

Hidden Markov Model (HMM)-based synthesis (HTS) has recently been confirmed to be the most effective method in generating natural speech. However, it lacks adequate context generalization when the training data is limited. As a solution, current study provides a new context-dependent speech modeling framework based on the Gaussian Conditional Random Field (GCRF) theory. By applying this model, an innovative speech synthesis system has been developed which can be viewed as an extension of Context-Dependent Hidden Semi Markov Model (CD-HSMM). A novel Viterbi decoder along with a stochastic gradient ascent algorithm was applied to train model parameters. Also, a fast and efficient parameter generation algorithm was derived for the synthesis part. Experimental results using objective and subjective criteria have shown that the proposed system outperforms HSMM substantially in limited speech databases. Moreover, Mel-cepstral distance of the spectral parameters has been reduced considerably for any size of training database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Black, A.W., Zen, H., Tokuda, K.: Statistical Parametric Speech Synthesis. In: ICASSP’2007, Honolulu, Hawai’i, USA, pp. IV-1229–IV-1232 (2007)

    Google Scholar 

  2. Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. Elsevier 51(11), 1039–1064 (2009)

    Article  Google Scholar 

  3. Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Hidden semi-markov model based speech synthesis. In: Interspeech’2004, Jeju Island, Korea, pp. 1393–1396, October 4–8 2004

    Google Scholar 

  4. Zen, H., Tokuda, K., Kitamura, T.: An introduction of trajectory model into hmm-based speech synthesis. In: SSW5, pp. 191–196. Carnegie Mellon University, June 2004

    Google Scholar 

  5. Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space probability distribution HMM. IEICE Trans. Inf. Syst. E85-D(3), 455–464 (2002)

    Google Scholar 

  6. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  7. Grimmett, G.R.: A theorem about random fields. Bull. Lond. Math. Soc. 5, 81–84 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  8. Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Getoor, L., Taskar, B. (eds.) Introduction to statistical Relational Learning. MIT Press, Cambridge (2006)

    Google Scholar 

  9. Gardner, W.A.: Learning characteristics of stochastic-gradient-descent algorithms: a general study, analysis and critique. Sig. Process. 6(2), 113–133 (1984)

    Article  Google Scholar 

  10. Vrahatis, M.N., Androulakis, G.S., Lambrinos, J.N., Magoulas, G.D.: A class of gradient unconstrained minimization algorithms with adaptive stepsize. J. Comput. Appl. Math. 114(2), 367–386 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  11. Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech parameter generation algorithms for HMM-based speech synthesis. In: ICASSP’2000, vol. 3, Istanbul, pp. 1315–1318, June 2000

    Google Scholar 

  12. Bijankhan, M., Sheikhzadegan, J., Roohani, M.R., Samareh, Y., Lucas, C., Tebiani, M.: The speech database of farsi spoken language. In: Proceedings of 5th Australian International Conference on Speech Science and Technology (SST’94), pp. 826–831 (1994)

    Google Scholar 

  13. Kawahara, H., Masuda-Katsuse, I., de Cheveigné, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3–4), 187–207 (1999)

    Article  Google Scholar 

  14. Kubichek, R.F.: Mel-cepstral distance measure for objective speech quality assessment. In: Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, pp. 125–128 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soheil Khorram .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Khorram, S., Bahmaninezhad, F., Sameti, H. (2014). Speech Synthesis Based on Gaussian Conditional Random Fields. In: Movaghar, A., Jamzad, M., Asadi, H. (eds) Artificial Intelligence and Signal Processing. AISP 2013. Communications in Computer and Information Science, vol 427. Springer, Cham. https://doi.org/10.1007/978-3-319-10849-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10849-0_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10848-3

  • Online ISBN: 978-3-319-10849-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics