Abstract
Majority of the current voice conversion methods do not focus on the modelling local variations of pitch contour, but only on linear modification of the pitch values, based on means and standard deviations. However, a significant amount of speaker related information is also present in pitch contour. In this paper we propose a non-linear pitch modification method for mapping the pitch contours of the source speaker according to the target speaker pitch contours. This work is done within the framework of Artificial Neural Networks (ANNs) based voice conversion. The pitch contours are represented with Discrete Cosine Transform (DCT) coefficients at the segmental level. The results evaluated using subjective and objective measures confirm that the proposed method performed better in mimicking the target speaker’s speaking style when compared to the linear modification method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proc. of ICASSP, New York, USA, pp. 655–658 (April 1988)
Stylianou, Y., Cappe, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing 6(2), 131–142 (1998)
Ohtani, Y., Toda, T., Saruwatari, H., Shikano, K.: Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation. In: Proc. of INTERSPEECH, Pittsburgh, USA, pp. 2266–2269 (September 2006)
Bollepalli, B., Black, A.W., Prahallad, K.: Modeling a noisy-channel for voice conversion using articulatory features. In: Proc. of INTERSPEECH, Portland, USA (August 2012)
Dutoit, T., Holzapfel, A., Jottrand, M., Moinet, A., Perez, J., Stylianou, Y.: Towards a voice conversion system based on frame selection. In: Proc. of ICASSP, pp. 513–516 (2007)
Stylianou, Y.: Voice transformation: A survey. In: Proc. of ICASSP, pp. 3585–3588 (2009)
Smith, J.O., Abel, J.S.: Bark and ERB bilinear transforms. IEEE Transactions on Speech and Audio Processing 7(6), 697–708 (1999)
Helander, E., Nurminen, J.: On the importance of pure prosody in the perception of speaker identity. In: Proc. of INTERSPEECH, pp. 2665–2668 (2007)
Teutenberg, J., Watson, C., Riddle, P.: Modeling and synthesizing F0 contours with the discrete cosine transform. In: Proc. of ICASSP, pp. 3973–3976 (2008)
Veaux, C., Rodet, X.: Intonation conversion from neutral to expressive speech. In: INTERSPEECH, pp. 2765–2768 (2011)
Helander, E., Nurminen, J.: A Novel method for prosody prediction in voice conversion. In: Proc. of ICASSP, pp. IV-509–IV-512 (2007)
Kawahara, H., Masuda-Katsuse, I., Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)
Haykin, S.: Neural networks: A comprehensive foundation. Prentice-Hall Inc., NJ (1999)
Desai, S., Black, A.W., Yegnanarayana, B., Prahallad, K.: Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Audio, Speech and Language Processing 18(5), 954–964 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bollepalli, B., Beskow, J., Gustafson, J. (2013). Non-linear Pitch Modification in Voice Conversion Using Artificial Neural Networks. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-38847-7_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38846-0
Online ISBN: 978-3-642-38847-7
eBook Packages: Computer ScienceComputer Science (R0)