Toward a Rule-Based Synthesis of Emotional Speech on Linguistic Descriptions of Perception

Huang, Chun-Fang; Akagi, Masato

doi:10.1007/11573548_47

Chun-Fang Huang¹⁹ &
Masato Akagi¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Included in the following conference series:

International Conference on Affective Computing and Intelligent Interaction

4996 Accesses
2 Citations

Abstract

This paper reports rules for morphing a voice to make it be perceived as containing various primitive features, for example, to make it sound more “bright” or “dark”. In a previous work we proposed a three-layered model, which contains emotional speech, primitive features, and acoustic features, for the perception of emotional speech. By experiments and acoustic analysis, we built the relationships between the three layers and reported that such relationships are significant. Then, a bottom-up method was adopted in order to verify the relationships. That is, we morphed (resynthesized) a speech voice by composing acoustic features in the bottommost layer to produce a voice in which listeners could perceive a single or multiple primitive features, which could be further perceived as different categories of emotion. The intermediate results show that the relationships of the model built in previous work are valid.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Raskin, J.: Humane Interface: New Directions for Designing Interactive Systems. Addison-Wesley, Boston (2000)
Google Scholar
Picard, R.W.: Affective Computing. MIT, Cambridge (2000)
Google Scholar
Massaro, D.W.: Perceiving Talking Faces. MIT, Cambridge (1998)
Google Scholar
Tatham, M., Morton, K.: Expression in Speech. Oxford University, Oxford (2004)
Google Scholar
Cahn, J.E.: Generating Expression in Synthesized Speech, Masters Thesis, MIT (1989), http://www.media.mit.edu/~cahn/masters-thesis.html
Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. JASA 93, 1097–1108
Google Scholar
Murray, I.R., Arnott, J.L.: Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication 16, 369–390
Google Scholar
Montero, J.M., Gutiérrez-Arriola, J., Palazuelos, S., Enríquez, E., Aguilera, S., Pardo, J.M.: Emotional speech synthesis: from speech database to TTS. In: ICSLP 1998, vol. 3, pp. 923–926 (1998)
Google Scholar
Schröder, M.: Emotional speech synthesis: a review. In: Proc. Eurospeech 2001 (2001)
Google Scholar
Vroomen, J., Collier, R., Mozziconacci, S.J.L.: Duration and intonation in emotional speech. In: Eurospeech 1993, vol. 1, pp. 577–580 (1993)
Google Scholar
Heuft, B., Portele, T., Rauth, M.: Emotions in time domain synthesis. In: ICSLP 1996 (1996)
Google Scholar
Edgington, M.: Investigating the limitations of concatenative synthesis. In: Eurospeech 1997 (1997)
Google Scholar
Iida, A., Campbell, N., Higuchi, F., Yasumura, M.: A corpus-based speech synthesis system with emotion. Speech Communication 40, 161–187 (2003)
Article MATH Google Scholar
Kawahara, H., Masuda-Katsusa, I., de Cheveign’e, A.: Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)
Article Google Scholar
Matsui, H., Kawahara, H.: Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system. In: Proc. Eurospeech 2003, pp. 2110–2116 (2003)
Google Scholar
Huang, C.-F., Akagi, M.: A multi-layer fuzzy logical model for emotional speech perception. In: Proc. EuroSpeech 2005 (2005)(accepted)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi-shi, Ishikawa, Japan
Chun-Fang Huang & Masato Akagi

Authors

Chun-Fang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Masato Akagi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences,
Jianhua Tao
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
MIT Media Laboratory, 20 Ames Street, 02139, Cambridge, MA, USA
Rosalind W. Picard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, CF., Akagi, M. (2005). Toward a Rule-Based Synthesis of Emotional Speech on Linguistic Descriptions of Perception. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_47

Download citation

DOI: https://doi.org/10.1007/11573548_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics