Abstract
In this paper, we address the — interrelated — problems of speaker characteristics (personalization) and suboptimal performance of emotion classification in state-of-the-art modules from two different points of view: first, we focus on a specific phenomenon (irregular phonation or laryngealization) and argue that its inherent multi-functionality and speaker-dependency makes its use as feature in emotion classification less promising than one might expect. Second, we focus on a specific application of emotion recognition in a voice portal and argue that constraints on time and budget often prevent the implementation of an optimal emotion recognition module.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cowie, R., Cornelius, R.: Describing the emotional states that are expressed in speech. Speech Communication 40, 5–32 (2003)
Schuller, B., Müller, R., Lang, M., Rigoll, G.: Speaker Independent Emotion Recognition by Early Fusion of Acoustic and Linguistic Features within Ensembles. In: Proc. 9th Eurospeech - Interspeech 2005, Lisbon, pp. 805–808 (2005)
Labov, W.: The Study of Language in its Social Context. Studium Generale 3, 30–87 (1970)
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining Efforts for Improving Automatic Classification of Emotional User States. In: Proceedings of IS-LTC 2006, Ljubliana, pp. 240–245 (2006)
Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H.: Tales of Tuning – Prototyping for Automatic Classification of Emotional User States. In: Proc. 9th Eurospeech - Interspeech 2005, Lisbon, pp. 489–492 (2005)
Schuller, B., Seppi, D., Batliner, A., Meier, A., Steidl, S.: Towards more Reality in the Recognition of Emotional Speech. In: Proc. of ICASSP 2007, Honolulu (to appear)
Scherer, K.: Vocal communication of emotion: A review of research paradigms. Speech Communication 40, 227–256 (2003)
Poggi, I., Pelachaud, C., de Carolis, B.: To Display or Not To Display? Towards the Architecture of a Reflexive Agent. In: Proceedings of the 2nd Workshop on Attitude, Personality and Emotions in User-adapted Interaction, User Modeling 2001, 7 pages (2001) (no pagination)
Batliner, A., Burger, S., Johne, B., Kießling, A.: MÜSLI: A Classification Scheme For Laryngealizations. In: House, D., Touati, P. (eds.) Proc. of an ESCA Workshop on Prosody. Lund University, Department of Linguistics, Lund, pp. 176–179 (1993)
Local, J., Kelly, J.: Projection and ‘silences’: notes on phonetic and conversational structure. Human Studies 9, 185–204 (1986)
Kushan, S., Slifka, J.: Is irregular phonation a reliable cue towards the segmentation of continuous speech in American English? In: Proc. of Speech Prosody 2006, Dresden, pp. 795–798 (2006)
Ní Chasaide, A., Gobl, C.: Voice Quality and f 0 in Prosody: Towards a Holistic Account. In: Proc. of Speech Prosody 2004, Nara, Japan, 4 pages (2004) (no pagination)
Ladefoged, P., Maddieson, I.: The Sound of the World’s Languages. Blackwell, Oxford (1996)
Gerfen, C., Baker, K.: The production and perception of laryngealized vowels in Coatzospan Mixtec. Journal of Phonetics, 311–334 (2005)
Fischer-Jørgensen, E.: Phonetic analysis of the stød in standard Danish. Phonetica 46, 1–59 (1989)
Laver, J.: Principles of Phonetics. Cambridge University Press, Cambridge (1994)
Wilden, I., Herzel, H., Peters, G., Tembrock, G.: Subharmonics, biphonation, and deterministic chaos in mammal vocalization. Bioacoustics 9, 171–196 (1998)
Freese, J., Maynard, D.W.: Prosodic features of bad news and good news in conversation. Language in Society 27, 195–219 (1998)
Gobl, C., Ní Chasaide, A.: The role of voice quality in communicating emotion, mood and attitude. Speech Communication 40(1-2), 189–212 (2003)
Drioli, C., Tisato, G., Cosi, P., Tesser, F.: Emotions and Voice Quality: Experiments with Sinusoidal Modeling. In: Proceedings of VOQUAL 2003, Geneva, pp. 127–132 (2003)
Ishi, C., Ishiguro, H., Hagita, N.: Using Prosodic and Voice Quality Features for Paralinguistic Information Extraction. In: Proc. of Speech Prosody 2006, Dresden, pp. 883–886 (2006)
Kießling, A., Kompe, R., Niemann, H., Nöth, E., Batliner, A.: Voice Source State as a Source of Information in Speech Recognition: Detection of Laryngealizations. In: Rubio Ayuso, A., López Soler, J. (eds.) Speech Recognition and Coding. New Advances and Trends. NATO ASI Series F, vol. 147, pp. 329–332. Springer, Heidelberg (1995)
Ishi, C., Ishiguro, H., Hagita, N.: Proposal of Acoustic Measures for Automatic Detection of Vocal Fry. In: Proc. 9th Eurospeech - Interspeech 2005, Lisbon, pp. 481–484 (2005)
Devillers, L., Vidrascu, L.: Real-life Emotion Recognition in Speech. In: Müller, C. (ed.) Speaker Classification II. LNCS(LNAI), vol. 4441, Springer, Heidelberg (2007)
Batliner, A., Buckow, J., Huber, R., Warnke, V., Nöth, E., Niemann, H.: Prosodic Feature Evaluation: Brute Force or Well Designed? In: Proc. of the 14th Int. Congress of Phonetic Sciences, San Francisco, vol. 3, pp. 2315–2318 (1999)
Batliner, A., Buckow, J., Huber, R., Warnke, V., Nöth, E., Niemann, H.: Boiling down Prosody for the Classification of Boundaries and Accents in German and English. In: Proc. 7th Eurospeech, Aalborg, pp. 2781–2784 (2001)
Batliner, A., Möbius, B.: Prosodic Models, Automatic Speech Understanding, and Speech Synthesis: Towards the Common Ground? In: Barry, W., Dommelen, W. (eds.) The Integration of Phonetic Knowledge in Speech Technology, pp. 21–44. Springer, Heidelberg (2005)
Kochanski, G., Grabe, E., Coleman, J., Rosner, B.: Loudness predicts Prominence; Fundamental Frequency lends little. Journal of Acoustical Society of America 11, 1038–1054 (2005)
Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Proc. Electronic Speech Signal Processing ESSP (2005)
Burkhardt, F., Stegmann, J., Ballegooy, M.V.: A voiceportal enhanced by semantic processing and affect awareness [34], pp. 582–586
Huber, R., Gallwitz, F., Warnke, V.: Verbesserung eines Voiceportals mit Hilfe akustischer Klassifikation von Emotion [34], pp. 577–581
Batliner, A., Burkhardt, F., van Ballegooy, M., Nöth, E.: A Taxonomy of Applications that Utilize Emotional Awareness. In: Proceedings of IS-LTC 2006, Ljubliana, pp. 246–250 (2006)
Burkhardt, F., Huber, R., Batliner, A.: Application of Speaker Classification in Human Machine Dialog Systems. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (2007)
Cremers, A.B., Manthey, R., Martini, P., Steinhage, V. (eds.): INFORMATIK 2005 - Informatik LIVE! Band 2, Beiträge der 35. Jahrestagung der Gesellschaft für Informatik e.V (GI), Bonn, 19. bis 22 (September 2005). In: Cremers, A.B., Manthey, R., Martini, P., Steinhage, V. (eds.) GI Jahrestagung (2). LNI., vol. 68, GI (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Batliner, A., Huber, R. (2007). Speaker Characteristics and Emotion Classification. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-74200-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74186-2
Online ISBN: 978-3-540-74200-5
eBook Packages: Computer ScienceComputer Science (R0)