Skip to main content

Expressive Speech Recognition and Synthesis as Enabling Technologies for Affective Robot-Child Communication

  • Conference paper
Advances in Multimedia Information Processing - PCM 2006 (PCM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4261))

Included in the following conference series:

Abstract

This paper presents our recent and current work on expressive speech synthesis and recognition as enabling technologies for affective robot-child interaction. We show that current expression recognition systems could be used to discriminate between several archetypical emotions, but also that the old adage ”there’s no data like more data” is more than ever valid in this field. A new speech synthesizer was developed that is capable of high quality concatenative synthesis. This system will be used in the robot to synthesize expressive nonsense speech by using prosody transplantation and a recorded database with expressive speech examples. With these enabling components lining up, we are getting ready to start experiments towards hopefully effective child-machine communication of affect and emotion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Simon et Odil: Website for hospitalized children, http://www.simonodil.com/

  2. IBBT research project ASCIT: Again at my School by fostering Communication through Interactive Technologies for long term sick children, https://projects.ibbt.be/ascit/

  3. Anty project website, http://anty.vub.ac.be/

  4. Anty foundation website, http://www.anty.org/

  5. Breazeal, C., Aryananda, L.: Recognition of Affective Communicative Intent in Robot-Directed Speech. Autonomous Robots 12, 83–104 (2002)

    Article  MATH  Google Scholar 

  6. Oudeyer, P.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies 59, 157–183 (2003)

    Article  Google Scholar 

  7. Slaney, M., McRoberts, G.: A Recognition System for Affective Vocalization. Speech Communication 39, 367–384 (2003)

    Article  MATH  Google Scholar 

  8. Ververidis, D., Kotropolos, C.: Automatic speech classification to five emotional states based on gender information. In: Proceedings of Eusipco 2004, pp. 341–344 (2004)

    Google Scholar 

  9. Hammal, Z., Bozkurt, B., Couvreur, L., Unay, D., Caplier, A., Dutoit, T.: Passive versus active: vocal classification system. In: Proceedings of Eusipco-2005 (2005)

    Google Scholar 

  10. Shami, M., Verhelst, W.: Automatic Classification of Emotions in Speech Using Multi-Corpora Approaches. In: Proc. of the second annual IEEE BENELUX/DSP Valley Signal Processing Symposium SPS-DARTS (2006)

    Google Scholar 

  11. Verhelst, W., Borger, M.: Intra-Speaker Transplantation of Speech Characteristics. In: An Application of Waveform Vocoding Techniques and DTW. Proceedings of Eurospeech 1991, Genova, pp. 1319–1322 (1991)

    Google Scholar 

  12. Van Coile, B., Van Tichelen, L., Vorstermans, A., Staessen, M.: Protran: A Prosody Transplantation Tool for Text-To-Speech Applications. In: Proceedings of the International Conference on Spoken Language Processing ICSLP 1994, Yokohama, pp. 423–426 (1994)

    Google Scholar 

  13. Moulines, E., Charpentier, F.: Pitch-Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis Using Diphones. Speech Communication 9, 453–467 (1990)

    Article  Google Scholar 

  14. Verhelst, W.: On the Quality of Speech Produced by Impulse Driven Linear Systems. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing - ICASSP 1991, pp. 501–504 (1991)

    Google Scholar 

  15. Mattheyses, W.: Vlaamstalige tekst-naar-spraak systemen met PSOLA (Flemish text-to-speech systems with PSOLA, in Dutch). Master thesis, Vrije Universiteit Brussel (2006)

    Google Scholar 

  16. Mattheyses, W., Verhelst, W., Verhoeve, P.: Robust Pitch Marking for Prosodic Modification of Speech Using TD-PSOLA. In: Proceedings of the IEEE Benelux/DSP Valley Signal Processing Symposium, SPS-DARTS, pp. 43–46 (2006)

    Google Scholar 

  17. Conkie, A., Isard, I.: Optimal coupling of diphones. In: Proceedings of the 2nd ESCA/IEEE Workshop on Speech Synthesis - SSW2 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yilmazyildiz, S., Mattheyses, W., Patsis, Y., Verhelst, W. (2006). Expressive Speech Recognition and Synthesis as Enabling Technologies for Affective Robot-Child Communication. In: Zhuang, Y., Yang, SQ., Rui, Y., He, Q. (eds) Advances in Multimedia Information Processing - PCM 2006. PCM 2006. Lecture Notes in Computer Science, vol 4261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11922162_1

Download citation

  • DOI: https://doi.org/10.1007/11922162_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-48766-1

  • Online ISBN: 978-3-540-48769-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics