Skip to main content

Toward a Rule-Based Synthesis of Emotional Speech on Linguistic Descriptions of Perception

  • Conference paper
Affective Computing and Intelligent Interaction (ACII 2005)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Abstract

This paper reports rules for morphing a voice to make it be perceived as containing various primitive features, for example, to make it sound more “bright” or “dark”. In a previous work we proposed a three-layered model, which contains emotional speech, primitive features, and acoustic features, for the perception of emotional speech. By experiments and acoustic analysis, we built the relationships between the three layers and reported that such relationships are significant. Then, a bottom-up method was adopted in order to verify the relationships. That is, we morphed (resynthesized) a speech voice by composing acoustic features in the bottommost layer to produce a voice in which listeners could perceive a single or multiple primitive features, which could be further perceived as different categories of emotion. The intermediate results show that the relationships of the model built in previous work are valid.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Raskin, J.: Humane Interface: New Directions for Designing Interactive Systems. Addison-Wesley, Boston (2000)

    Google Scholar 

  2. Picard, R.W.: Affective Computing. MIT, Cambridge (2000)

    Google Scholar 

  3. Massaro, D.W.: Perceiving Talking Faces. MIT, Cambridge (1998)

    Google Scholar 

  4. Tatham, M., Morton, K.: Expression in Speech. Oxford University, Oxford (2004)

    Google Scholar 

  5. Cahn, J.E.: Generating Expression in Synthesized Speech, Masters Thesis, MIT (1989), http://www.media.mit.edu/~cahn/masters-thesis.html

  6. Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. JASA 93, 1097–1108

    Google Scholar 

  7. Murray, I.R., Arnott, J.L.: Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication 16, 369–390

    Google Scholar 

  8. Montero, J.M., Gutiérrez-Arriola, J., Palazuelos, S., Enríquez, E., Aguilera, S., Pardo, J.M.: Emotional speech synthesis: from speech database to TTS. In: ICSLP 1998, vol. 3, pp. 923–926 (1998)

    Google Scholar 

  9. Schröder, M.: Emotional speech synthesis: a review. In: Proc. Eurospeech 2001 (2001)

    Google Scholar 

  10. Vroomen, J., Collier, R., Mozziconacci, S.J.L.: Duration and intonation in emotional speech. In: Eurospeech 1993, vol. 1, pp. 577–580 (1993)

    Google Scholar 

  11. Heuft, B., Portele, T., Rauth, M.: Emotions in time domain synthesis. In: ICSLP 1996 (1996)

    Google Scholar 

  12. Edgington, M.: Investigating the limitations of concatenative synthesis. In: Eurospeech 1997 (1997)

    Google Scholar 

  13. Iida, A., Campbell, N., Higuchi, F., Yasumura, M.: A corpus-based speech synthesis system with emotion. Speech Communication 40, 161–187 (2003)

    Article  MATH  Google Scholar 

  14. Kawahara, H., Masuda-Katsusa, I., de Cheveign’e, A.: Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)

    Article  Google Scholar 

  15. Matsui, H., Kawahara, H.: Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system. In: Proc. Eurospeech 2003, pp. 2110–2116 (2003)

    Google Scholar 

  16. Huang, C.-F., Akagi, M.: A multi-layer fuzzy logical model for emotional speech perception. In: Proc. EuroSpeech 2005 (2005)(accepted)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, CF., Akagi, M. (2005). Toward a Rule-Based Synthesis of Emotional Speech on Linguistic Descriptions of Perception. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_47

Download citation

  • DOI: https://doi.org/10.1007/11573548_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29621-8

  • Online ISBN: 978-3-540-32273-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics