Voice Conversion Based on Weighted Least Squares Estimation Criterion and Residual Prediction from Pitch Contour

Zhang, Jian; Sun, Jun; Dai, Beiqian

doi:10.1007/11573548_42

Jian Zhang¹⁹,
Jun Sun¹⁹ &
Beiqian Dai¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Included in the following conference series:

International Conference on Affective Computing and Intelligent Interaction

5035 Accesses

Abstract

This paper describes an enhanced system for more efficient voice conversion. A weighted LMSE (Least Mean Squared Error) criterion is adopted, instead of conventional LMSE, for the spectral conversion function training. In addition, a short-term pitch contour mapping algorithm together with a new residual codebook formed from pitch contour is presented. Informal listening tests prove that convincing voice conversion is achieved while maintaining high speech quality. Evaluations by objective tests also show that the proposed system reduces speaker individual discrimination compared with the baseline system in LPC based analysis/synthesis framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Moulines, E., et al.: Voice conversion: state of the art and perspectives. Elsevier 16(2), 125–126 (1995)
Google Scholar
Kuwabara, H., Sagisaka, Y.: Acoustic characteristics of speaker individuality: control and conversion. Speech communication 16(2), 165–173 (1995)
Article Google Scholar
Abe, M., et al.: Voice conversion through vector quantization. In: Proceedings of ICASSP, pp. 655–658 (1988)
Google Scholar
Stylianou, Y., Cappe, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE trans. in Speech & Audio processing 6, 131–142 (1998)
Article Google Scholar
Kain, A., Macon, M.: Spectral voice conversion for text-to-speech synthesis. In: Proceedings of ICASSP, vol. 1, pp. 285–288 (1998)
Google Scholar
Hui, Y., Steve, Y.: Perceptually weighted linear transformation for voice conversion. In: Eurospeech, pp. 2409–2412 (2003)
Google Scholar
Kain, A., Macon, M.: Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. In: Proceedings of ICASSP, pp. 813–816 (2001)
Google Scholar
Lawson, C.L., Hanson, R.J.: Solving Least Squares Problem. Prentice-Hall International, Inc., Englewood Cliffs
Google Scholar
Kain, A., Stylianou, Y.: Stochastic modeling of spectral adjustment for high quality pitch modification. In: Proceedings of ICASSP, pp. 949–952 (2000)
Google Scholar
Chang, E., Shi, Y., Zhou, J., Huang, C.: Speech lab in a box: a mandarin speech toolbox to jumpstart speech related research. In: Eurospeech, pp. 2799–2802 (2001)
Google Scholar
Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition, ch. 4. Prentice-Hall, Inc., Englewood Cliffs
Google Scholar

Download references

Author information

Authors and Affiliations

Electronic Science and Technology Department, University of Science and Technology of China, 230026, Hefei, Anhui, China
Jian Zhang, Jun Sun & Beiqian Dai

Authors

Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Beiqian Dai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences,
Jianhua Tao
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
MIT Media Laboratory, 20 Ames Street, 02139, Cambridge, MA, USA
Rosalind W. Picard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Sun, J., Dai, B. (2005). Voice Conversion Based on Weighted Least Squares Estimation Criterion and Residual Prediction from Pitch Contour. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_42

Download citation

DOI: https://doi.org/10.1007/11573548_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics