Abstract
Harmonics plus Noise Model (HNM) is a well known speech signal representation technique that allows to apply high quality modifications to the signal used in text-to-speech systems providing higher flexibility than its counterpart TD-PSOLA based synthesis systems. In this paper an adaptation of the adaptive pre-emphasis linear prediction technique for modifying the vocal effort, using HNM speech representation, is presented. The proposed transformation methodology is validated using a Copy Re-synthesis strategy on a speech corpora specifically designed with three levels of vocal effort (soft, modal and loud). The results of a perceptual test demonstrate the effectiveness of the proposed technique performing all different vocal effort conversions for the given corpus.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sampa computer readable phonetic alphabet, http://www.phon.ucl.ac.uk/home/sampa
Depalle, P., Hélie, T.: Extraction of spectral peak parameters using s stft modeling and no-sidelobe windows. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 1997 (1997)
Drioli, C., Tisato, G., Cosi, P., Tesser, F.: Emotions and voice quality: experiments with sinusoidal modeling. In: Proceedings of VOQUAL 2003, Geneva, Switzerland, pp. 127–132 (August 2003)
Erro, D.: Intra-lingual and Cross-lingual voice conversion using Harmonic plus Stochastic Models. Ph.D. thesis, UPC (June 2008)
Gobl, C., Bennett, E., Chasaide, A.N.: Expressive synthesis: how crucial is voice quality? In: Proceedings of 2002 IEEE Workshop on Speech Synthesis, pp. 91–94 (2002)
Gu, H.Y., Liau, H.L.: Mandarin singing voice synthesis using an hnm based scheme. In: Congress on Image and Signal Processing, CISP 2008, vol. 5, pp. 347–351 (May 2008)
Kim, S.J., Kim, J.J., Hahn, M.: Hmm-based korean speech synthesis system for hand-held devices. IEEE Transactions on Consumer Electronics 52(4), 1384–1390 (2006)
McAulay, R.J., Quatieri, T.F.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-34 (August 1986)
Monzo, C., Calzada, A., Iriondo, I., Socoró, J.C.: Expressive speech style transformation: Voice quality and prosody modification using a harmonic plus noise model. In: Proc. Speech Prosody, no. 100985, Chicago (2010)
Moon, T.K.: Mathematical Methods and Algorithms for Signal Processing (1999)
Nordstorm, K.I., Driessen, P.F.: Variable pre-emphasis lpc for modeling vocal effort in the singing voice. In: Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx 2006), pp. 18–20 (September 2006)
Nordstrom, K., Tzanetakis, G., Driessen, P.: Transforming perceived vocal effort and breathiness using adaptive pre-emphasis linear prediction. IEEE Transactions on Audio, Speech, and Language Processing 16(6), 1087–1096 (2008)
Planet, S., Iriondo, I., Martínez, E., Montero, J.: True: an online testing platform for multimedia evaluation. In: Proc. 2nd International Workshop on EMOTION: Corpora for Research on Emotion and Affect at LREC 2008, Marrakech (2008)
Rank, E., Pirker, H.: Generating emotional speech with a concatenative synthesizer. In: Proceedings of ICSLP 1998, Sydney, Australia, pp. 671–674 (1998)
Schröder, M., Grice, M.: Expressing vocal effort in concatenative synthesis. In: Proc. 15th International Conference of Phonetic Sciences, pp. 2589–2592 (2003)
Stylianou, I.: Harmonic plus noise Models for speech combined with statistical methods for speech and speaker modification. Ph.D. thesis, Ecole Nationale Supérieure des Télécomunications (1996)
Syrdal, A., Stylianou, Y., Garrison, L., Conkie, A., Schroeter, J.: Td-psola versus harmonic plus noise model in diphone based speech synthesis. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, 1998, vol. 1, pp. 273–276 (May 1998)
Türk, O., Schröder, M.: A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Calzada, À., Socoró, J.C. (2011). Vocal Effort Modification through Harmonics Plus Noise Model Representation. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-25020-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25019-4
Online ISBN: 978-3-642-25020-0
eBook Packages: Computer ScienceComputer Science (R0)