Skip to main content

Nonlinear Synthesis of Vowels in the LP Residual Domain with a Regularized RBF Network

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2085))

Abstract

In this paper we present a speech analysis/synthesis coder based on a combination of linear prediction with nonlinear modeling of the residual using a regularized radial basis function (RBF) network. The model has been applied to synthesis of sustained vowel signals and has been found to preserve the dynamics and spectra of the original speech signal. While several nonlinear speech models reportedly suffer from high-frequency losses in the synthesized speech due to system inherent low-pass behavior, our approach achieves good speech signal reproduction even in the higher frequency ranges. The decomposition of the speech signal by linear prediction analysis supports processing during synthesis such as pitch modifications while the nonlinear modeling provides the means for adequate reproduction of the fine-grained dynamic characteristics of speech.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Floris Takens, “On the numerical determination of the dimension of an attractor,” inDynamic Systems and Turbulence, D. Rand and L.S. Young, Eds., vol. 898 of Warwick 1980 Lecture Notes in Mathematics, pp. 366–381. Springer, Berlin, 1981.

    Chapter  Google Scholar 

  2. Hans-Peter Bernhard, The Mutual Information Function and its Application to Signal Processing, Ph.D. thesis, Vienna University of Technology, 1997.

    Google Scholar 

  3. Kurt Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359–366, 1989.

    Article  Google Scholar 

  4. Gunnar Fant, Acoustic Theory of Speech Production, Mouton, The Hague, Paris, 1970.

    Google Scholar 

  5. John D. Markel and Augustine H. Gray, Jr., Linear Prediction of Speech, Springer, Berlin, Heidelberg, New York, 1976.

    MATH  Google Scholar 

  6. Gernot Kubin, “Nonlinear processing of speech,” in Speech Coding and Synthesis, W. Bastiaan Kleijn and K.K. Paliwal, Eds., pp. 557–610. Elsevier, Amsterdam, 1995.

    Google Scholar 

  7. José Principe, Ludong Wang, and Jyh-Ming Kuo, “Nonlinear dynamic modeling with neural networks,” in The first European Conference on Signal Analysis and Prediction, 1997.

    Google Scholar 

  8. Simon Haykin, “Neural networks expand SP’s horizon,” IEEE Signal Processing Magazine, vol. 13,no. 2, pp. 24–49, Mar. 1996.

    Article  Google Scholar 

  9. Martin Birgmeier, “A fully Kalman-trained radial basis function network for nonlinear speech modeling,” in Proc. of the IEEE ICNN’95, Perth, Australia, 1995, pp. 259–264.

    Google Scholar 

  10. Martin Birgmeier, Kalman-trained Neural Networks for Signal Processing Applications, Ph.D. thesis, Vienna University of Technology, 1996.

    Google Scholar 

  11. Gernot Kubin, Signal Analysis and Prediction, chapter Signal Analysis and Speech Processing, pp. 375–394, Birkhaeuser, Boston, 1998.

    Google Scholar 

  12. Iain Mann and Steve McLaughlin, “Stable speech synthesis using recurrent radial basis functions,” in Proc. of EuroSpeech’99, Budapest, Hungary, 1999.

    Google Scholar 

  13. Iain Mann, An Investigation of Nonlinear Speech Synthesis and Pitch Modification Techniques, Ph.D. thesis, University of Edinburgh, 1999.

    Google Scholar 

  14. Karthik Narasimhan, José C. Principe, and Donald G. Childers, “Nonlinear dynamic modeling of the voiced excitation for improved speech synthesis,” in Proc. of ICASSP’99, 1999.

    Google Scholar 

  15. Simon Haykin, Neural Networks. A Comprehensive Foundation, Macmillan College Publishing Company, New York, Toronto, Oxford, 1994.

    MATH  Google Scholar 

  16. Hans-Peter Bernhard and Gernot Kubin, “Detection of chaotic behaviour in speech signals using Fraser’s mutual information algorithm,” in Proc. of 13th GRETSI Symposium on Signal and Image Processing, Juan-les-Pins, France, Sept. 1991, pp. 1301–1311.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rank, E., Kubin, G. (2001). Nonlinear Synthesis of Vowels in the LP Residual Domain with a Regularized RBF Network. In: Mira, J., Prieto, A. (eds) Bio-Inspired Applications of Connectionism. IWANN 2001. Lecture Notes in Computer Science, vol 2085. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45723-2_90

Download citation

  • DOI: https://doi.org/10.1007/3-540-45723-2_90

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42237-2

  • Online ISBN: 978-3-540-45723-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics