Skip to main content

Evaluating Voice Quality and Speech Synthesis Using Crowdsourcing

  • Conference paper
Text, Speech, and Dialogue (TSD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

Abstract

One of the key aspects of creating high quality synthetic speech is the validation process. Establishing validation processes that are reliable and scalable is challenging. Today, the maturity of the crowdsourcing infrastructure along with better techniques for validating the data gathered through crowdsourcing have made it possible to perform reliable speech synthesis validation at a larger scale. In this paper, we present a study of voice quality evaluation using the crowdsourcing platform. We investigate voice gender preference across eight locales for three typical TTS scenarios. We also examine to which degree speaker adaptation can carry over certain voice qualities, such as mood, of the target speaker to the adapted TTS. Based on an existing full TTS font, adaptation is carried out on a smaller amount of speech data from a target speaker. Finally, we show how crowdsourcing contributes to objective assessment when dealing with voice preference in voice talent selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wolters, M., Isaac, K., Renalds, S.: Evaluating Speech Synthesis intelligibility using Amazon’s Mechanical Turk. In: Proc. 7th Speech Synthesis Workshop, SSW7 (2010)

    Google Scholar 

  2. King, S., Karaiskos, V.: The Blizzard Challenge 2012. In: Proc. Blizzard Challenge Workshop 2012, Portland, OR, USA (2012)

    Google Scholar 

  3. Lane, I., Waibel, A., Eck, M., Rottman, K.: Tools for Collecting Speech Corpora via Mechanical-Turk. In: Proc. of Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 185–187 (2010)

    Google Scholar 

  4. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: Proc. of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. Association for Computational Linguistics (2008)

    Google Scholar 

  5. Marge, M., Banerjee, S., Rudnicky, A.: Using the Amazon Mechanical Turk for transcription of spoken language. In: Proc. IEEE-ICASSP (2010)

    Google Scholar 

  6. Parent, G., Eskenazi, M.: Speaking to the Crowd: looking at past achievements in using crowdsourcing for speech and predicting future challenges. In: Proc. of INTERSPEECH 2011, pp. 3037–3040 (2011)

    Google Scholar 

  7. Cooke, M., Barker, J., Lecumberri, M.: Crowdsourcing in Speech Perception. In: Eskenazi, M., Levow, G., Meng, H., Parent, G., Suendermann, D. (eds.) Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment, pp. 137–172. Wiley, West Sussex (2013)

    Chapter  Google Scholar 

  8. Lee, J., Nass, C., Brave, S.: Can computer-generated speech have gender?: an experimental test of gender stereotype. In: Proc. CHI EA 2000, CHI 2000 Extended Abstracts on Human Factors in Computing Systems, pp. 289–290. ACM, New York (2000)

    Google Scholar 

  9. Masuko, T., Tokuda, K., Kobayashi, T., Imai, S.: Voice characteristics conversion for HMM-based speech synthesis system. In: Proc. of ICASSP, pp. 1611–1614 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Parson, J., Braga, D., Tjalve, M., Oh, J. (2013). Evaluating Voice Quality and Speech Synthesis Using Crowdsourcing. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40585-3_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40584-6

  • Online ISBN: 978-3-642-40585-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics