Non-linear Pitch Modification in Voice Conversion Using Artificial Neural Networks

Bollepalli, Bajibabu; Beskow, Jonas; Gustafson, Joakim

doi:10.1007/978-3-642-38847-7_13

Bajibabu Bollepalli²¹,
Jonas Beskow²¹ &
Joakim Gustafson²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7911))

Included in the following conference series:

International Conference on Nonlinear Speech Processing

1140 Accesses
2 Citations

Abstract

Majority of the current voice conversion methods do not focus on the modelling local variations of pitch contour, but only on linear modification of the pitch values, based on means and standard deviations. However, a significant amount of speaker related information is also present in pitch contour. In this paper we propose a non-linear pitch modification method for mapping the pitch contours of the source speaker according to the target speaker pitch contours. This work is done within the framework of Artificial Neural Networks (ANNs) based voice conversion. The pitch contours are represented with Discrete Cosine Transform (DCT) coefficients at the segmental level. The results evaluated using subjective and objective measures confirm that the proposed method performed better in mimicking the target speaker’s speaking style when compared to the linear modification method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proc. of ICASSP, New York, USA, pp. 655–658 (April 1988)
Google Scholar
Stylianou, Y., Cappe, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing 6(2), 131–142 (1998)
Article Google Scholar
Ohtani, Y., Toda, T., Saruwatari, H., Shikano, K.: Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation. In: Proc. of INTERSPEECH, Pittsburgh, USA, pp. 2266–2269 (September 2006)
Google Scholar
Bollepalli, B., Black, A.W., Prahallad, K.: Modeling a noisy-channel for voice conversion using articulatory features. In: Proc. of INTERSPEECH, Portland, USA (August 2012)
Google Scholar
Dutoit, T., Holzapfel, A., Jottrand, M., Moinet, A., Perez, J., Stylianou, Y.: Towards a voice conversion system based on frame selection. In: Proc. of ICASSP, pp. 513–516 (2007)
Google Scholar
Stylianou, Y.: Voice transformation: A survey. In: Proc. of ICASSP, pp. 3585–3588 (2009)
Google Scholar
Smith, J.O., Abel, J.S.: Bark and ERB bilinear transforms. IEEE Transactions on Speech and Audio Processing 7(6), 697–708 (1999)
Article Google Scholar
Helander, E., Nurminen, J.: On the importance of pure prosody in the perception of speaker identity. In: Proc. of INTERSPEECH, pp. 2665–2668 (2007)
Google Scholar
Teutenberg, J., Watson, C., Riddle, P.: Modeling and synthesizing F0 contours with the discrete cosine transform. In: Proc. of ICASSP, pp. 3973–3976 (2008)
Google Scholar
Veaux, C., Rodet, X.: Intonation conversion from neutral to expressive speech. In: INTERSPEECH, pp. 2765–2768 (2011)
Google Scholar
Helander, E., Nurminen, J.: A Novel method for prosody prediction in voice conversion. In: Proc. of ICASSP, pp. IV-509–IV-512 (2007)
Google Scholar
Kawahara, H., Masuda-Katsuse, I., Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)
Article Google Scholar
Haykin, S.: Neural networks: A comprehensive foundation. Prentice-Hall Inc., NJ (1999)
MATH Google Scholar
Desai, S., Black, A.W., Yegnanarayana, B., Prahallad, K.: Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Audio, Speech and Language Processing 18(5), 954–964 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Speech, Music and Hearing, KTH, Sweden
Bajibabu Bollepalli, Jonas Beskow & Joakim Gustafson

Authors

Bajibabu Bollepalli
View author publications
You can also search for this author in PubMed Google Scholar
Jonas Beskow
View author publications
You can also search for this author in PubMed Google Scholar
Joakim Gustafson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

TCTS Lab, University of Mons, 31, Bouldevard Bolez, 7000, Mons, Belgium
Thomas Drugman
TCTS Lab, University of Mons, 31, Boulevard Dolez, 7000, Mons, Belgium
Thierry Dutoit

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bollepalli, B., Beskow, J., Gustafson, J. (2013). Non-linear Pitch Modification in Voice Conversion Using Artificial Neural Networks. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-38847-7_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38846-0
Online ISBN: 978-3-642-38847-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics