Nonlinear Synthesis of Vowels in the LP Residual Domain with a Regularized RBF Network

Rank, Erhard; Kubin, Gernot

doi:10.1007/3-540-45723-2_90

Nonlinear Synthesis of Vowels in the LP Residual Domain with a Regularized RBF Network

Erhard Rank⁶ &
Gernot Kubin⁷

Conference paper
First Online: 01 January 2001

440 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2085))

Abstract

In this paper we present a speech analysis/synthesis coder based on a combination of linear prediction with nonlinear modeling of the residual using a regularized radial basis function (RBF) network. The model has been applied to synthesis of sustained vowel signals and has been found to preserve the dynamics and spectra of the original speech signal. While several nonlinear speech models reportedly suffer from high-frequency losses in the synthesized speech due to system inherent low-pass behavior, our approach achieves good speech signal reproduction even in the higher frequency ranges. The decomposition of the speech signal by linear prediction analysis supports processing during synthesis such as pitch modifications while the nonlinear modeling provides the means for adequate reproduction of the fine-grained dynamic characteristics of speech.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Floris Takens, “On the numerical determination of the dimension of an attractor,” inDynamic Systems and Turbulence, D. Rand and L.S. Young, Eds., vol. 898 of Warwick 1980 Lecture Notes in Mathematics, pp. 366–381. Springer, Berlin, 1981.
Chapter Google Scholar
Hans-Peter Bernhard, The Mutual Information Function and its Application to Signal Processing, Ph.D. thesis, Vienna University of Technology, 1997.
Google Scholar
Kurt Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359–366, 1989.
Article Google Scholar
Gunnar Fant, Acoustic Theory of Speech Production, Mouton, The Hague, Paris, 1970.
Google Scholar
John D. Markel and Augustine H. Gray, Jr., Linear Prediction of Speech, Springer, Berlin, Heidelberg, New York, 1976.
MATH Google Scholar
Gernot Kubin, “Nonlinear processing of speech,” in Speech Coding and Synthesis, W. Bastiaan Kleijn and K.K. Paliwal, Eds., pp. 557–610. Elsevier, Amsterdam, 1995.
Google Scholar
José Principe, Ludong Wang, and Jyh-Ming Kuo, “Nonlinear dynamic modeling with neural networks,” in The first European Conference on Signal Analysis and Prediction, 1997.
Google Scholar
Simon Haykin, “Neural networks expand SP’s horizon,” IEEE Signal Processing Magazine, vol. 13,no. 2, pp. 24–49, Mar. 1996.
Article Google Scholar
Martin Birgmeier, “A fully Kalman-trained radial basis function network for nonlinear speech modeling,” in Proc. of the IEEE ICNN’95, Perth, Australia, 1995, pp. 259–264.
Google Scholar
Martin Birgmeier, Kalman-trained Neural Networks for Signal Processing Applications, Ph.D. thesis, Vienna University of Technology, 1996.
Google Scholar
Gernot Kubin, Signal Analysis and Prediction, chapter Signal Analysis and Speech Processing, pp. 375–394, Birkhaeuser, Boston, 1998.
Google Scholar
Iain Mann and Steve McLaughlin, “Stable speech synthesis using recurrent radial basis functions,” in Proc. of EuroSpeech’99, Budapest, Hungary, 1999.
Google Scholar
Iain Mann, An Investigation of Nonlinear Speech Synthesis and Pitch Modification Techniques, Ph.D. thesis, University of Edinburgh, 1999.
Google Scholar
Karthik Narasimhan, José C. Principe, and Donald G. Childers, “Nonlinear dynamic modeling of the voiced excitation for improved speech synthesis,” in Proc. of ICASSP’99, 1999.
Google Scholar
Simon Haykin, Neural Networks. A Comprehensive Foundation, Macmillan College Publishing Company, New York, Toronto, Oxford, 1994.
MATH Google Scholar
Hans-Peter Bernhard and Gernot Kubin, “Detection of chaotic behaviour in speech signals using Fraser’s mutual information algorithm,” in Proc. of 13th GRETSI Symposium on Signal and Image Processing, Juan-les-Pins, France, Sept. 1991, pp. 1301–1311.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Communications and Radio-Frequency Engineering, Vienna University of Technology, Gusshausstrasse 25/E389, A, 1040, Vienna, Austria
Erhard Rank
Institute of Communications and Wave Propagation, Graz University of Technology, Inffeldgasse 16c, A, 8010, Graz, Austria
Gernot Kubin

Authors

Erhard Rank
View author publications
You can also search for this author in PubMed Google Scholar
Gernot Kubin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Inteligencia Artificial, Universidad Nacional de Educación a Distancia, Senda del Rey, s/n., 28040, Madrid, Spain
José Mira
Departamento de Arquitectura y Tecnología de Computadores, Universidad de Granada, Campus Fuentenueva, 18071, Granada, Spain
Alberto Prieto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rank, E., Kubin, G. (2001). Nonlinear Synthesis of Vowels in the LP Residual Domain with a Regularized RBF Network. In: Mira, J., Prieto, A. (eds) Bio-Inspired Applications of Connectionism. IWANN 2001. Lecture Notes in Computer Science, vol 2085. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45723-2_90

Download citation

DOI: https://doi.org/10.1007/3-540-45723-2_90
Published: 12 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42237-2
Online ISBN: 978-3-540-45723-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics