Skip to main content

Speaker Characteristics and Emotion Classification

  • Chapter
Speaker Classification I

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4343))

Abstract

In this paper, we address the — interrelated — problems of speaker characteristics (personalization) and suboptimal performance of emotion classification in state-of-the-art modules from two different points of view: first, we focus on a specific phenomenon (irregular phonation or laryngealization) and argue that its inherent multi-functionality and speaker-dependency makes its use as feature in emotion classification less promising than one might expect. Second, we focus on a specific application of emotion recognition in a voice portal and argue that constraints on time and budget often prevent the implementation of an optimal emotion recognition module.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cowie, R., Cornelius, R.: Describing the emotional states that are expressed in speech. Speech Communication 40, 5–32 (2003)

    Article  MATH  Google Scholar 

  2. Schuller, B., Müller, R., Lang, M., Rigoll, G.: Speaker Independent Emotion Recognition by Early Fusion of Acoustic and Linguistic Features within Ensembles. In: Proc. 9th Eurospeech - Interspeech 2005, Lisbon, pp. 805–808 (2005)

    Google Scholar 

  3. Labov, W.: The Study of Language in its Social Context. Studium Generale 3, 30–87 (1970)

    Google Scholar 

  4. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining Efforts for Improving Automatic Classification of Emotional User States. In: Proceedings of IS-LTC 2006, Ljubliana, pp. 240–245 (2006)

    Google Scholar 

  5. Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H.: Tales of Tuning – Prototyping for Automatic Classification of Emotional User States. In: Proc. 9th Eurospeech - Interspeech 2005, Lisbon, pp. 489–492 (2005)

    Google Scholar 

  6. Schuller, B., Seppi, D., Batliner, A., Meier, A., Steidl, S.: Towards more Reality in the Recognition of Emotional Speech. In: Proc. of ICASSP 2007, Honolulu (to appear)

    Google Scholar 

  7. Scherer, K.: Vocal communication of emotion: A review of research paradigms. Speech Communication 40, 227–256 (2003)

    Article  MATH  Google Scholar 

  8. Poggi, I., Pelachaud, C., de Carolis, B.: To Display or Not To Display? Towards the Architecture of a Reflexive Agent. In: Proceedings of the 2nd Workshop on Attitude, Personality and Emotions in User-adapted Interaction, User Modeling 2001, 7 pages (2001) (no pagination)

    Google Scholar 

  9. Batliner, A., Burger, S., Johne, B., Kießling, A.: MÜSLI: A Classification Scheme For Laryngealizations. In: House, D., Touati, P. (eds.) Proc. of an ESCA Workshop on Prosody. Lund University, Department of Linguistics, Lund, pp. 176–179 (1993)

    Google Scholar 

  10. Local, J., Kelly, J.: Projection and ‘silences’: notes on phonetic and conversational structure. Human Studies 9, 185–204 (1986)

    Article  Google Scholar 

  11. Kushan, S., Slifka, J.: Is irregular phonation a reliable cue towards the segmentation of continuous speech in American English? In: Proc. of Speech Prosody 2006, Dresden, pp. 795–798 (2006)

    Google Scholar 

  12. Ní Chasaide, A., Gobl, C.: Voice Quality and f 0 in Prosody: Towards a Holistic Account. In: Proc. of Speech Prosody 2004, Nara, Japan, 4 pages (2004) (no pagination)

    Google Scholar 

  13. Ladefoged, P., Maddieson, I.: The Sound of the World’s Languages. Blackwell, Oxford (1996)

    Google Scholar 

  14. Gerfen, C., Baker, K.: The production and perception of laryngealized vowels in Coatzospan Mixtec. Journal of Phonetics, 311–334 (2005)

    Google Scholar 

  15. Fischer-Jørgensen, E.: Phonetic analysis of the stød in standard Danish. Phonetica 46, 1–59 (1989)

    Article  Google Scholar 

  16. Laver, J.: Principles of Phonetics. Cambridge University Press, Cambridge (1994)

    Google Scholar 

  17. Wilden, I., Herzel, H., Peters, G., Tembrock, G.: Subharmonics, biphonation, and deterministic chaos in mammal vocalization. Bioacoustics 9, 171–196 (1998)

    Google Scholar 

  18. Freese, J., Maynard, D.W.: Prosodic features of bad news and good news in conversation. Language in Society 27, 195–219 (1998)

    Article  Google Scholar 

  19. Gobl, C., Ní Chasaide, A.: The role of voice quality in communicating emotion, mood and attitude. Speech Communication 40(1-2), 189–212 (2003)

    Article  MATH  Google Scholar 

  20. Drioli, C., Tisato, G., Cosi, P., Tesser, F.: Emotions and Voice Quality: Experiments with Sinusoidal Modeling. In: Proceedings of VOQUAL 2003, Geneva, pp. 127–132 (2003)

    Google Scholar 

  21. Ishi, C., Ishiguro, H., Hagita, N.: Using Prosodic and Voice Quality Features for Paralinguistic Information Extraction. In: Proc. of Speech Prosody 2006, Dresden, pp. 883–886 (2006)

    Google Scholar 

  22. Kießling, A., Kompe, R., Niemann, H., Nöth, E., Batliner, A.: Voice Source State as a Source of Information in Speech Recognition: Detection of Laryngealizations. In: Rubio Ayuso, A., López Soler, J. (eds.) Speech Recognition and Coding. New Advances and Trends. NATO ASI Series F, vol. 147, pp. 329–332. Springer, Heidelberg (1995)

    Google Scholar 

  23. Ishi, C., Ishiguro, H., Hagita, N.: Proposal of Acoustic Measures for Automatic Detection of Vocal Fry. In: Proc. 9th Eurospeech - Interspeech 2005, Lisbon, pp. 481–484 (2005)

    Google Scholar 

  24. Devillers, L., Vidrascu, L.: Real-life Emotion Recognition in Speech. In: Müller, C. (ed.) Speaker Classification II. LNCS(LNAI), vol. 4441, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  25. Batliner, A., Buckow, J., Huber, R., Warnke, V., Nöth, E., Niemann, H.: Prosodic Feature Evaluation: Brute Force or Well Designed? In: Proc. of the 14th Int. Congress of Phonetic Sciences, San Francisco, vol. 3, pp. 2315–2318 (1999)

    Google Scholar 

  26. Batliner, A., Buckow, J., Huber, R., Warnke, V., Nöth, E., Niemann, H.: Boiling down Prosody for the Classification of Boundaries and Accents in German and English. In: Proc. 7th Eurospeech, Aalborg, pp. 2781–2784 (2001)

    Google Scholar 

  27. Batliner, A., Möbius, B.: Prosodic Models, Automatic Speech Understanding, and Speech Synthesis: Towards the Common Ground? In: Barry, W., Dommelen, W. (eds.) The Integration of Phonetic Knowledge in Speech Technology, pp. 21–44. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  28. Kochanski, G., Grabe, E., Coleman, J., Rosner, B.: Loudness predicts Prominence; Fundamental Frequency lends little. Journal of Acoustical Society of America 11, 1038–1054 (2005)

    Article  Google Scholar 

  29. Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Proc. Electronic Speech Signal Processing ESSP (2005)

    Google Scholar 

  30. Burkhardt, F., Stegmann, J., Ballegooy, M.V.: A voiceportal enhanced by semantic processing and affect awareness [34], pp. 582–586

    Google Scholar 

  31. Huber, R., Gallwitz, F., Warnke, V.: Verbesserung eines Voiceportals mit Hilfe akustischer Klassifikation von Emotion [34], pp. 577–581

    Google Scholar 

  32. Batliner, A., Burkhardt, F., van Ballegooy, M., Nöth, E.: A Taxonomy of Applications that Utilize Emotional Awareness. In: Proceedings of IS-LTC 2006, Ljubliana, pp. 246–250 (2006)

    Google Scholar 

  33. Burkhardt, F., Huber, R., Batliner, A.: Application of Speaker Classification in Human Machine Dialog Systems. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  34. Cremers, A.B., Manthey, R., Martini, P., Steinhage, V. (eds.): INFORMATIK 2005 - Informatik LIVE! Band 2, Beiträge der 35. Jahrestagung der Gesellschaft für Informatik e.V (GI), Bonn, 19. bis 22 (September 2005). In: Cremers, A.B., Manthey, R., Martini, P., Steinhage, V. (eds.) GI Jahrestagung (2). LNI., vol. 68, GI (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Batliner, A., Huber, R. (2007). Speaker Characteristics and Emotion Classification. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74200-5_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74186-2

  • Online ISBN: 978-3-540-74200-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics