Abstract
One hundred thirty three (133) sound/speech features extracted from Pitch, Mel Frequency Cepstral Coefficients, Energy and Formants were evaluated in order to create a feature set sufficient to discriminate between seven emotions in acted speech. After the appropriate feature selection, Multilayered Perceptrons were trained for emotion recognition on the basis of a 23-input vector, which provide information about the prosody of the speaker over the entire sentence. Several experiments were performed and the results are presented analytically. Extra emphasis was given to assess the proposed 23-input vector in a speaker independent framework where speakers are not “known” to the classifier. The proposed feature vector achieved promising results (51%) for speaker independent recognition in seven emotion classes. Moreover, considering the problem of classifying high and low arousal emotions, our classifier reaches 86.8% successful recognition. The second classification model incorporated Support Vector Machine with 35 predictive variables. The latter feature vector achieved promising results (78%) for speaker independent recognition in seven emotion classes. Moreover, considering the problem of classifying high and low arousal emotions, our classifier reaches 100 % successful recognition for high arousal and 87% for low arousal emotions. Beside the combination of speech processing and artificial intelligence techniques, new approaches incorporating linguistic semantics could play a critical role to help computers understand human emotions better.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wierzbicka, A.: Emotions across languages and cultures: Diversity and universals. Cambridge University Press, Cambridge (1999)
Software for Predictive Modelling and Forecasting (2009), http://www.dtreg.com/
Cowie, R., Cowie, E.D., Cox, C.: Beyond emotion archetypes: databases for emotion modelling using neural networks. Neural Networks 18(4), 371–388 (2005)
Chen, L.S., Huang, T.S.: Emotional expressions in audiovisual human computer interaction. In: Proc. of International Conference of Multimedia and Expo (ICME), pp. 423–426 (2000)
Chen, L.S., Huang, T.S., Miyasato, T., Nakatsu, R.: Multimodal human emotion/expression recognition. In: Proc. of 3rd IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 396–401 (1998)
De Silva, L.C., Ng, P.C.: Bimodal emotion recognition. In: Proc. of 4th IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 332–335 (2000)
Yoshitomi, Y., Kim, S., Kawano, T., Kitazoe, T.: Effect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face. In: Proc. of 9th IEEE International Workshop on Robot and Human Interactive Communication, pp. 178–183 (2000)
Ekman, P.: Universals and Cultural Differences in Facial Expression of Emotion. In: Cole, J.R. (ed.) Motivation. University of Nebraska Press (1972)
Ekman, P.: An Argument for Basic Emotions. Cognition and Emotion 6(3), 169–200 (1972)
Izard, C.E.: The Face of Emotion. Appleton-Century-Crofts, New York (1971)
Izard, C.E.: Basic Emotions, Relations among Emotions and Emotion – Cognition Relations. Psychological Review 99, 561–565 (1992)
Tomkins, S.S.: Affect, Imagery, Consciousness: The Positive Affects. Springer, New York (1962)
Tomkins, S.S.: Affect Theory. In: Scherer, K.R., et al. (eds.) Approaches to Emotion. Erlbaum, Hillsdale (1984)
Fontaine, J.R.J., Scherer, K.R., Roesch, E.B., Ellsworth, P.C.: The world of emotions is not two dimensional. Psychological Sciences 18(12), 1050–1057 (2007)
Ververidis, D., Kotropoulos, C.: A State of the Art Review on Emotional Speech Databases. In: Proc. of the 1st Richmedia Conference, pp. 109–119 (2003)
Kim, S., Georgiou, P., Lee, S., Narayanan, S.: Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In: Proc. of IEEE Multimedia Signal Processing Workshop, pp. 48–51 (2007)
Morrison, D., Wang, R., De Silva, L.C.: Ensemble methods for spoken emotion recognition in call-centres. Speech Communication 49, 98–112 (2007)
Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A.: Prosody-based automatic detection of annoyance and frustration in human–computer dialog. In: Proc. of the International Conference on Spoken Language Processing (ICSLP), pp. 2037–2040 (2002)
Petrushin, V.: Emotion recognition in speech signal: experimental study, development, and application. In: Proc. of the 6th International Conference on Spoken Language Processing (ICSLP), pp. 222–225 (2000)
Bänziger, T., Scherer, K.R.: The role of intonation in emotional expression. Speech Communication 46, 252–267 (2005)
Abdulla, W.H., Kasabov, N.K.: Improving speech recognition performance through gender separation. In: Proc. of the 5th Biannual Conference on Artificial Neural Networks and Expert Systems (ANNES), pp. 218–222 (2001)
Wang, Y., Guan, L.: Recognizing human emotion from audiovisual information. In: Proc. of International Conference on Acoustic and Signal Processing (ICASP), pp. 1125–1128 (2005)
Vogt, T., Andre, E.: Improving Automatic Emotion Recognition from Speech via Gender Differentiation. In: Proc. of Language Resources and Evaluation Conference (LREC), pp. 1123–1126 (2006)
Kostoulas, T.P., Fakotakis, N.: A Speaker Dependent Emotion Recognition Framework. In: Proc. of Fifth International Symposium on Communication Systems, Networks and Digital Signal Processing (CSNDSP), pp. 305–309 (2006)
Fingerhut, M.: Music Information Retrieval, or how to search for (and maybe find) music and do away with incipits. In: International Association of Music Libraries, Archives and Documentation Centers (IAML) and the International Association of Sound and Audiovisual Archives (IASA), IAML-IASA Congress (2004)
Boersma, P., Weenik, D.: Praat, a system for doing phonetics by computer, Technical Report 132, Inst Phonetic Sciences, Univ. Amsterdam (2003), www.praat.org
Scherer, K.R.: Vocal communication of emotion: a review of research paradigms. Speech Communication 40, 227–256 (2003)
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1978)
Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S.: Emotion recognition based on phoneme classes. In: Proc. of the International Conference on Spoken Language Processing, ICSLP (2004)
Waikato Environment for Knowledge Analysis, WEKA (2006), http://www.cs.waikato.ac.nz/ml/weka/
Kostoulas, T., Ganchev, T., Fakotakis, N.: Study on speaker-independent emotion recognition from speech on real-world data. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 235–242. Springer, Heidelberg (2008)
Hozjan, V., Kacic, Z.: Context-independent multilingual emotion recognition from speech signals. International Journal of Speech Technology 6, 311–320 (2006)
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18, 32–80 (2001)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proc. of Interspeech, pp. 1515–1520 (2005)
Murray, I.R., Arnott, J.L.: Towards a simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. Journal of Acoustic Society America 93(2), 1097–1108 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Anagnostopoulos, CN., Iliou, T. (2010). Towards Emotion Recognition from Speech: Definition, Problems and the Materials of Research. In: Wallace, M., Anagnostopoulos, I.E., Mylonas, P., Bielikova, M. (eds) Semantics in Adaptive and Personalized Services. Studies in Computational Intelligence, vol 279. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11684-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-11684-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11683-4
Online ISBN: 978-3-642-11684-1
eBook Packages: EngineeringEngineering (R0)