Abstract
In the present work we report results from on-going research activity in the area of speaker-independent emotion recognition. Experimentations are performed towards examining the behavior of a detector of negative emotional states over non-acted/acted speech. Furthermore, a score-level fusion of two classifiers on utterance level is applied, in attempt to improve the performance of the emotion recognizer. Experimental results demonstrate significant differences on recognizing emotions on acted/real-world speech.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Devillers, L., Vidrascu, L.: Real life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In: Proc. of the Interspeech 2006, pp. 801–804 (2006)
Schuller, B., Müller, R., Lang, M., Rigoll, G.: Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proc. of the Interspeech 2005, pp. 805–808 (2005)
Lugger, M., Yang, B.: The relevance of voice quality features in speaker independent emotion recognition. In: Proc. of the ICASSP 2007, vol. IV, pp. 17–20 (2007)
Lee, C.M., Narayanan, S.S.: Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing 13(2), 293–303 (2005)
Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proc. of the 17 IFA 1993, pp. 97–110 (1993)
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on ASSP 28, 357–366 (1980)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Processing 3, 72–83 (1995)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39, 1–38 (1977)
Wilting, J., Kramber, E., Swerts, M.: Real vs. acted emotional speech. In: Proc. of the Interspeech 2006, pp. 805–808 (2006)
University of Pennsylvania, Linguistic Data Consortium, Emotional Prosody Speech (2002), http://www.ldc.uppen.edu/Catalog/CatalogEntry.jsp?cataloId=LDC2002S28
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Proc. of Interspeech 2005, pp. 1517–1520 (2005)
Artificial Intelligence Group, Wire Communication Laboratory, University of Patras, http://www.wcl.ee.upatras.gr/ai/Research/SEmo.htm
Kostoulas, T., Ganchev, T., Mporas, I., Fakotakis, N.: A real-world emotional speech corpus for modern Greek. In: Proc. of LREC 2008, Morocco (May 2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kostoulas, T., Ganchev, T., Fakotakis, N. (2008). Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds) Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction. Lecture Notes in Computer Science(), vol 5042. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70872-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-70872-8_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70871-1
Online ISBN: 978-3-540-70872-8
eBook Packages: Computer ScienceComputer Science (R0)