Skip to main content

Performance Evaluation for Voice Conversion Systems

  • Conference paper
Book cover Text, Speech and Dialogue (TSD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

Abstract

In the present work, we introduce a new performance evaluation measure for assessing the capacity of voice conversion systems to modify the speech of one speaker (source) so that it sounds as if it was uttered by another speaker (target). This measure relies on a GMM-UBM-based likelihood estimator that estimates the degree of proximity between an utterance of the converted voice and the predefined models of the source and target voices. The proposed approach allows the formulation of an objective criterion, which is applicable for both evaluation of the virtue of a single system and for direct comparison (benchmarking) among different voice conversion systems. To illustrate the functionality and the practical usefulness of the proposed measure, we contrast it with four well-known objective evaluation criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proc. ICASSP 1988, USA, pp. 655–658 (1988)

    Google Scholar 

  2. Kreiman, J., Papcun, G.: Comparing, discrimination and recognition of unfamiliar voices. Speech Communication 10(3), 265–275 (1991)

    Article  Google Scholar 

  3. Methods for subjective determination of transmission quality, Tech. Rep. ITU-T Recommendation P.800, ITU, Switzerland (1996)

    Google Scholar 

  4. Arslan, L.M.: Speaker transformation algorithm using segmental codebooks (STASC). Speech Communication 28(3), 211–226 (1999)

    Article  Google Scholar 

  5. Kain, A.: High resolution voice transformation. Ph.D. dissertation, OGI, Portland, USA (2001)

    Google Scholar 

  6. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10(1-3), 19–41 (2000)

    Article  Google Scholar 

  7. Sündermann, D., Ney, H., Höge, H.: VTLN-based cross-language voice conversion. In: Proc. ASRU 2003, USA, pp. 676–681 (2003)

    Google Scholar 

  8. Stylianou, Y., Cappé, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech and Audio Processing 6(2), 131–142 (1998)

    Article  Google Scholar 

  9. Sündermann, D., Bonafonte, A., Ney, H., Höge, H.: A study on residual prediction techniques for voice conversion. In: Proc. ICASSP 2005, USA, vol. 1, pp. 13–16 (2005)

    Google Scholar 

  10. Kominek, J., Black, A.: The CMU ARCTIC speech databases for speech synthesis research. Technical Report CMU-LTI-03-177, Carnegie Mellon University, Pittsburgh, PA (2003)

    Google Scholar 

  11. Slaney, M.: Auditory toolbox. Version 2. Technical Report #1998-010, Interval Research Corporation (1998)

    Google Scholar 

  12. Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., McGonegal, C.A.: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust. Speech & Signal Proc. 24(5), 399–418 (1976)

    Article  Google Scholar 

  13. Garofolo, J.: Getting started with the DARPA-TIMIT CD-ROM: An acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), USA (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ganchev, T., Lazaridis, A., Mporas, I., Fakotakis, N. (2008). Performance Evaluation for Voice Conversion Systems. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87391-4_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87390-7

  • Online ISBN: 978-3-540-87391-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics