Expressive Speech Processing and Prosody Engineering: An Illustrated Essay on the Fragmented Nature of Real Interactive Speech
This chapter addresses the issue of expressive speech processing. It attempts to explain a mechanism for expressiveness in speech, and proposes a novel dimension of spoken language processing for speech technology applications, showing that although great progress has already been made, there is still much to be done before we can consider speech processing to be a truly mature technology.
KeywordsPropositional Content Voice Quality Speech Synthesis Telephone Conversation Conversational Speech
This work is partly supported by the Ministry of Public Management, Home Affairs, Posts, and Telecommunications, Japan under the SCOPE funding initiative. The ESP corpus was collected over a period of 5 years with support from the Japan Science & Technology Corporation (JST/CREST) Core Research for Evolutional Science & Technology funding initiative. The author also wishes to thank the management of the Spoken Language Communication Research Laboratory and the Advanced Telecommunications Research Institute International for their continuing support and encouragement of this work. The chapter was written while the author was employed by NiCT, the National Institute of Information and Communications Technology. He is currently employed by Trinity College, the University of Dublin, Ireland, as Stokes Professor of Speech & Communication Technology.
- 1.The Japan Science & Technology Agency. (2000-2005). Core Research for Evolutional Science & Technology.Google Scholar
- 2.Campbell, N. (2007). On the use of nonverbal speech sounds in human communication. In: Verbal and Nonverbal Communication Behaviors, Berlin, Heidelberg, Springer, 2007, LNAI Vol. 4775, 117-128.Google Scholar
- 3.Campbell, N., Mokhtari, P. (2003). Voice quality is the 4th prosodic parameter. In: Proc. 15th ICPhS, Barcelona, 203-206.Google Scholar
- 5.Hanson, H. M. (1995). Glottal characteristics of female speakers. Ph.D. dissertation, Harvard University.Google Scholar
- 6.Cahn, J. (1989). The generation of affect in synthesised speech. J. Am. Voice I/O Soc., 8, 251-256. SSML, The Speech Synthesis Markup Language, www.w3.org/TR/speech synthesis/Google Scholar
- 8.Calzolari, N. (2006). Introduction of the Conference Chair. In: Proc. 5th Int. Conf. on Language Resources and Evaluation, Genoa, I-IV.Google Scholar
- 9.ICSI meeting corpus web page, http://www.icsi.berkeley.edu/speech/mr. As of May 2010.Google Scholar
- 10.AMI: Augmented Multi-party Interaction (http://www.amiproject.org). As of May 2010.Google Scholar
- 11.Schroeder, M. (2004). Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In: Proc. Workshop on Affective Dialogue Systems: Lecture Notes in Computer Science, Kloster Irsee, Germany, 209-220.Google Scholar