Skip to main content

Whence and Whither: The Automatic Recognition of Emotions in Speech (Invited Keynote)

  • Conference paper
Perception in Multimodal Dialogue Systems (PIT 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5078))

Abstract

In this talk, we first want to sketch the (short) history of the automatic recognition of emotions in speech: studies on the characteristics of emotions in speech were published as early as in the twenties and thirties of the last century; attempts to recognize them automatically began in the mid nineties, dealing with acted data which still are used often - too often if we consider the fact that drawing inferences from acted data onto realistic data is at least sub-optimal.

In a second part, we present the necessary ‘basics’: the design of the scenario, the recordings, the manual processing (transliteration, annotation), etc. These basics are to some extent ‘generic’ — for instance, each speech database has to be transliterated orthographically somehow. Other ones are specific such as the principles and the guidelines for emotion annotation, and the basic choices between, for example, dimensional and categorical approaches. The pros and cons of different annotation approaches have been discussed widely; however, the unit of analysis (utterance, turn, sentence, etc.?) has not yet been dealt with often; thus we will discuss this topic in more detail.

In a third part, we will present acoustic and linguistic features that have been used (or should be used) in this field, and touch on the topic of their different degree of relevance.

Classification and necessary ingredients such as feature reduction and selection, choice of classifier, and assessment of classification performance, will be addressed in the fourth part.

So far, we have been dealing with the ‘whence’ in our title, depicting the state-of-the-art; we will end up the talk with the ‘whither’ in the title — with promising applications and some speculations on dead end approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Editor information

Elisabeth André Laila Dybkjær Wolfgang Minker Heiko Neumann Roberto Pieraccini Michael Weber

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Batliner, A. (2008). Whence and Whither: The Automatic Recognition of Emotions in Speech (Invited Keynote). In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds) Perception in Multimodal Dialogue Systems. PIT 2008. Lecture Notes in Computer Science(), vol 5078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69369-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69369-7_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69368-0

  • Online ISBN: 978-3-540-69369-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics