Expressive Speech Processing and Prosody Engineering: An Illustrated Essay on the Fragmented Nature of Real Interactive Speech

  • Nick Campbell


This chapter addresses the issue of expressive speech processing. It attempts to explain a mechanism for expressiveness in speech, and proposes a novel dimension of spoken language processing for speech technology applications, showing that although great progress has already been made, there is still much to be done before we can consider speech processing to be a truly mature technology.


Propositional Content Voice Quality Speech Synthesis Telephone Conversation Conversational Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is partly supported by the Ministry of Public Management, Home Affairs, Posts, and Telecommunications, Japan under the SCOPE funding initiative. The ESP corpus was collected over a period of 5 years with support from the Japan Science & Technology Corporation (JST/CREST) Core Research for Evolutional Science & Technology funding initiative. The author also wishes to thank the management of the Spoken Language Communication Research Laboratory and the Advanced Telecommunications Research Institute International for their continuing support and encouragement of this work. The chapter was written while the author was employed by NiCT, the National Institute of Information and Communications Technology. He is currently employed by Trinity College, the University of Dublin, Ireland, as Stokes Professor of Speech & Communication Technology.


  1. 1.
    The Japan Science & Technology Agency. (2000-2005). Core Research for Evolutional Science & Technology.Google Scholar
  2. 2.
    Campbell, N. (2007). On the use of nonverbal speech sounds in human communication. In: Verbal and Nonverbal Communication Behaviors, Berlin, Heidelberg, Springer, 2007, LNAI Vol. 4775, 117-128.Google Scholar
  3. 3.
    Campbell, N., Mokhtari, P. (2003). Voice quality is the 4th prosodic parameter. In: Proc. 15th ICPhS, Barcelona, 203-206.Google Scholar
  4. 4.
    Alku, P., Backstrom, T., Vilkman, E. (2002). Normalized amplitude quotient for parametriza- tion of the glottal flow. J Acoust Soc Am, 112(2), 701-710.CrossRefGoogle Scholar
  5. 5.
    Hanson, H. M. (1995). Glottal characteristics of female speakers. Ph.D. dissertation, Harvard University.Google Scholar
  6. 6.
    Cahn, J. (1989). The generation of affect in synthesised speech. J. Am. Voice I/O Soc., 8, 251-256. SSML, The Speech Synthesis Markup Language, synthesis/Google Scholar
  7. 7.
    Campbell, N. (2005). Getting to the heart of the matter; speech as expression of affect rather than just text or language, Lang. Res. Eval., 39 (1), 109-118.CrossRefGoogle Scholar
  8. 8.
    Calzolari, N. (2006). Introduction of the Conference Chair. In: Proc. 5th Int. Conf. on Language Resources and Evaluation, Genoa, I-IV.Google Scholar
  9. 9.
    ICSI meeting corpus web page, As of May 2010.Google Scholar
  10. 10.
    AMI: Augmented Multi-party Interaction ( As of May 2010.Google Scholar
  11. 11.
    Schroeder, M. (2004). Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In: Proc. Workshop on Affective Dialogue Systems: Lecture Notes in Computer Science, Kloster Irsee, Germany, 209-220.Google Scholar
  12. 12.
    Campbell, N. (2006). Conversational Speech Synthesis and the need for some laughter. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1171-1178.CrossRefGoogle Scholar
  13. 13.
    Cowie, R., Douglas-Cowie, E., Cox, C. (2005). Beyond emotion archetypes: Databases for emotion modeling using neural networks. Neural Netw., 18, 371-388CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Centre for Language and Communication Studies (CLCS)The University of DublinDublin 2Ireland

Personalised recommendations