Abstract
This paper reports rules for morphing a voice to make it be perceived as containing various primitive features, for example, to make it sound more “bright” or “dark”. In a previous work we proposed a three-layered model, which contains emotional speech, primitive features, and acoustic features, for the perception of emotional speech. By experiments and acoustic analysis, we built the relationships between the three layers and reported that such relationships are significant. Then, a bottom-up method was adopted in order to verify the relationships. That is, we morphed (resynthesized) a speech voice by composing acoustic features in the bottommost layer to produce a voice in which listeners could perceive a single or multiple primitive features, which could be further perceived as different categories of emotion. The intermediate results show that the relationships of the model built in previous work are valid.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Raskin, J.: Humane Interface: New Directions for Designing Interactive Systems. Addison-Wesley, Boston (2000)
Picard, R.W.: Affective Computing. MIT, Cambridge (2000)
Massaro, D.W.: Perceiving Talking Faces. MIT, Cambridge (1998)
Tatham, M., Morton, K.: Expression in Speech. Oxford University, Oxford (2004)
Cahn, J.E.: Generating Expression in Synthesized Speech, Masters Thesis, MIT (1989), http://www.media.mit.edu/~cahn/masters-thesis.html
Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. JASA 93, 1097–1108
Murray, I.R., Arnott, J.L.: Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication 16, 369–390
Montero, J.M., Gutiérrez-Arriola, J., Palazuelos, S., Enríquez, E., Aguilera, S., Pardo, J.M.: Emotional speech synthesis: from speech database to TTS. In: ICSLP 1998, vol. 3, pp. 923–926 (1998)
Schröder, M.: Emotional speech synthesis: a review. In: Proc. Eurospeech 2001 (2001)
Vroomen, J., Collier, R., Mozziconacci, S.J.L.: Duration and intonation in emotional speech. In: Eurospeech 1993, vol. 1, pp. 577–580 (1993)
Heuft, B., Portele, T., Rauth, M.: Emotions in time domain synthesis. In: ICSLP 1996 (1996)
Edgington, M.: Investigating the limitations of concatenative synthesis. In: Eurospeech 1997 (1997)
Iida, A., Campbell, N., Higuchi, F., Yasumura, M.: A corpus-based speech synthesis system with emotion. Speech Communication 40, 161–187 (2003)
Kawahara, H., Masuda-Katsusa, I., de Cheveign’e, A.: Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)
Matsui, H., Kawahara, H.: Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system. In: Proc. Eurospeech 2003, pp. 2110–2116 (2003)
Huang, C.-F., Akagi, M.: A multi-layer fuzzy logical model for emotional speech perception. In: Proc. EuroSpeech 2005 (2005)(accepted)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, CF., Akagi, M. (2005). Toward a Rule-Based Synthesis of Emotional Speech on Linguistic Descriptions of Perception. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_47
Download citation
DOI: https://doi.org/10.1007/11573548_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)