Speaker Characteristics and Emotion Classification

Batliner, Anton; Huber, Richard

doi:10.1007/978-3-540-74200-5_7

Anton Batliner¹ &
Richard Huber²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4343))

2422 Accesses
13 Citations

Abstract

In this paper, we address the — interrelated — problems of speaker characteristics (personalization) and suboptimal performance of emotion classification in state-of-the-art modules from two different points of view: first, we focus on a specific phenomenon (irregular phonation or laryngealization) and argue that its inherent multi-functionality and speaker-dependency makes its use as feature in emotion classification less promising than one might expect. Second, we focus on a specific application of emotion recognition in a voice portal and argue that constraints on time and budget often prevent the implementation of an optimal emotion recognition module.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cowie, R., Cornelius, R.: Describing the emotional states that are expressed in speech. Speech Communication 40, 5–32 (2003)
Article MATH Google Scholar
Schuller, B., Müller, R., Lang, M., Rigoll, G.: Speaker Independent Emotion Recognition by Early Fusion of Acoustic and Linguistic Features within Ensembles. In: Proc. 9th Eurospeech - Interspeech 2005, Lisbon, pp. 805–808 (2005)
Google Scholar
Labov, W.: The Study of Language in its Social Context. Studium Generale 3, 30–87 (1970)
Google Scholar
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining Efforts for Improving Automatic Classification of Emotional User States. In: Proceedings of IS-LTC 2006, Ljubliana, pp. 240–245 (2006)
Google Scholar
Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H.: Tales of Tuning – Prototyping for Automatic Classification of Emotional User States. In: Proc. 9th Eurospeech - Interspeech 2005, Lisbon, pp. 489–492 (2005)
Google Scholar
Schuller, B., Seppi, D., Batliner, A., Meier, A., Steidl, S.: Towards more Reality in the Recognition of Emotional Speech. In: Proc. of ICASSP 2007, Honolulu (to appear)
Google Scholar
Scherer, K.: Vocal communication of emotion: A review of research paradigms. Speech Communication 40, 227–256 (2003)
Article MATH Google Scholar
Poggi, I., Pelachaud, C., de Carolis, B.: To Display or Not To Display? Towards the Architecture of a Reflexive Agent. In: Proceedings of the 2nd Workshop on Attitude, Personality and Emotions in User-adapted Interaction, User Modeling 2001, 7 pages (2001) (no pagination)
Google Scholar
Batliner, A., Burger, S., Johne, B., Kießling, A.: MÜSLI: A Classification Scheme For Laryngealizations. In: House, D., Touati, P. (eds.) Proc. of an ESCA Workshop on Prosody. Lund University, Department of Linguistics, Lund, pp. 176–179 (1993)
Google Scholar
Local, J., Kelly, J.: Projection and ‘silences’: notes on phonetic and conversational structure. Human Studies 9, 185–204 (1986)
Article Google Scholar
Kushan, S., Slifka, J.: Is irregular phonation a reliable cue towards the segmentation of continuous speech in American English? In: Proc. of Speech Prosody 2006, Dresden, pp. 795–798 (2006)
Google Scholar
Ní Chasaide, A., Gobl, C.: Voice Quality and f ₀ in Prosody: Towards a Holistic Account. In: Proc. of Speech Prosody 2004, Nara, Japan, 4 pages (2004) (no pagination)
Google Scholar
Ladefoged, P., Maddieson, I.: The Sound of the World’s Languages. Blackwell, Oxford (1996)
Google Scholar
Gerfen, C., Baker, K.: The production and perception of laryngealized vowels in Coatzospan Mixtec. Journal of Phonetics, 311–334 (2005)
Google Scholar
Fischer-Jørgensen, E.: Phonetic analysis of the stød in standard Danish. Phonetica 46, 1–59 (1989)
Article Google Scholar
Laver, J.: Principles of Phonetics. Cambridge University Press, Cambridge (1994)
Google Scholar
Wilden, I., Herzel, H., Peters, G., Tembrock, G.: Subharmonics, biphonation, and deterministic chaos in mammal vocalization. Bioacoustics 9, 171–196 (1998)
Google Scholar
Freese, J., Maynard, D.W.: Prosodic features of bad news and good news in conversation. Language in Society 27, 195–219 (1998)
Article Google Scholar
Gobl, C., Ní Chasaide, A.: The role of voice quality in communicating emotion, mood and attitude. Speech Communication 40(1-2), 189–212 (2003)
Article MATH Google Scholar
Drioli, C., Tisato, G., Cosi, P., Tesser, F.: Emotions and Voice Quality: Experiments with Sinusoidal Modeling. In: Proceedings of VOQUAL 2003, Geneva, pp. 127–132 (2003)
Google Scholar
Ishi, C., Ishiguro, H., Hagita, N.: Using Prosodic and Voice Quality Features for Paralinguistic Information Extraction. In: Proc. of Speech Prosody 2006, Dresden, pp. 883–886 (2006)
Google Scholar
Kießling, A., Kompe, R., Niemann, H., Nöth, E., Batliner, A.: Voice Source State as a Source of Information in Speech Recognition: Detection of Laryngealizations. In: Rubio Ayuso, A., López Soler, J. (eds.) Speech Recognition and Coding. New Advances and Trends. NATO ASI Series F, vol. 147, pp. 329–332. Springer, Heidelberg (1995)
Google Scholar
Ishi, C., Ishiguro, H., Hagita, N.: Proposal of Acoustic Measures for Automatic Detection of Vocal Fry. In: Proc. 9th Eurospeech - Interspeech 2005, Lisbon, pp. 481–484 (2005)
Google Scholar
Devillers, L., Vidrascu, L.: Real-life Emotion Recognition in Speech. In: Müller, C. (ed.) Speaker Classification II. LNCS(LNAI), vol. 4441, Springer, Heidelberg (2007)
Chapter Google Scholar
Batliner, A., Buckow, J., Huber, R., Warnke, V., Nöth, E., Niemann, H.: Prosodic Feature Evaluation: Brute Force or Well Designed? In: Proc. of the 14th Int. Congress of Phonetic Sciences, San Francisco, vol. 3, pp. 2315–2318 (1999)
Google Scholar
Batliner, A., Buckow, J., Huber, R., Warnke, V., Nöth, E., Niemann, H.: Boiling down Prosody for the Classification of Boundaries and Accents in German and English. In: Proc. 7th Eurospeech, Aalborg, pp. 2781–2784 (2001)
Google Scholar
Batliner, A., Möbius, B.: Prosodic Models, Automatic Speech Understanding, and Speech Synthesis: Towards the Common Ground? In: Barry, W., Dommelen, W. (eds.) The Integration of Phonetic Knowledge in Speech Technology, pp. 21–44. Springer, Heidelberg (2005)
Chapter Google Scholar
Kochanski, G., Grabe, E., Coleman, J., Rosner, B.: Loudness predicts Prominence; Fundamental Frequency lends little. Journal of Acoustical Society of America 11, 1038–1054 (2005)
Article Google Scholar
Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Proc. Electronic Speech Signal Processing ESSP (2005)
Google Scholar
Burkhardt, F., Stegmann, J., Ballegooy, M.V.: A voiceportal enhanced by semantic processing and affect awareness [34], pp. 582–586
Google Scholar
Huber, R., Gallwitz, F., Warnke, V.: Verbesserung eines Voiceportals mit Hilfe akustischer Klassifikation von Emotion [34], pp. 577–581
Google Scholar
Batliner, A., Burkhardt, F., van Ballegooy, M., Nöth, E.: A Taxonomy of Applications that Utilize Emotional Awareness. In: Proceedings of IS-LTC 2006, Ljubliana, pp. 246–250 (2006)
Google Scholar
Burkhardt, F., Huber, R., Batliner, A.: Application of Speaker Classification in Human Machine Dialog Systems. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (2007)
Chapter Google Scholar
Cremers, A.B., Manthey, R., Martini, P., Steinhage, V. (eds.): INFORMATIK 2005 - Informatik LIVE! Band 2, Beiträge der 35. Jahrestagung der Gesellschaft für Informatik e.V (GI), Bonn, 19. bis 22 (September 2005). In: Cremers, A.B., Manthey, R., Martini, P., Steinhage, V. (eds.) GI Jahrestagung (2). LNI., vol. 68, GI (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Mustererkennung, Universität Erlangen–Nürnberg, Martensstr. 3, 91058 Erlangen, Germany
Anton Batliner
Sympalog Voice Solutions GmbH, Karl-Zucker-Str. 10, 91052 Erlangen, Germany
Richard Huber

Authors

Anton Batliner
View author publications
You can also search for this author in PubMed Google Scholar
Richard Huber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Batliner, A., Huber, R. (2007). Speaker Characteristics and Emotion Classification. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-74200-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74186-2
Online ISBN: 978-3-540-74200-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics