Problems of the Automatic Emotion Recognitions in Spontaneous Speech; An Example for the Recognition in a Dispatcher Center

  • Klára Vicsi
  • Dávid Sztahó
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6456)


Numerous difficulties, in the examination of emotions occurring in continuous spontaneous speech, are discussed in this paper, than different emotion recognition experiments are presented, using clauses as the recognition unit. In a testing experiment it was examined that what kind of acoustical features are the most important for the characterization of emotions, using spontaneous speech database. An SVM classifier was built for the classification of 4 most frequent emotions. It was found that fundamental frequency, energy, and its dynamics in a clause are the main characteristic parameters for the emotions, and the average spectral information, as MFCC and harmonicity are also very important. In a real life experiment automatic recognition system was prepared for a telecommunication call center. Summing up the results of these experiments, we can say, that clauses can be an optimal unit of the recognition of emotions in continuous speech.


speech emotion recognition telephone speech prosodic recognizer speech emotion database 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Campbell, N.: Getting to the heart of the matter. Keynote Speech. In: Proc. Language resources and Evaluation Conference (LREC 2004), Lisabon, Portugal (2004)Google Scholar
  2. 2.
    Hozjan, V., Kacic, Z.: A rule-based emotion-dependent feature extraction method for emotion analysis from speech. The Journal of the Acoustical Society of America 119(5), 3109–3120 (2006)CrossRefGoogle Scholar
  3. 3.
    Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: towards a new generation of databases. Speech Communication 40, 33–60 (2003)CrossRefzbMATHGoogle Scholar
  4. 4.
    Burkhardt, F., Paeschke, A., et al.: A database of German Emotional Speech. In: N:Proc. of Interspeech 2005, pp. 1517–1520 (2005)Google Scholar
  5. 5.
    Navas, E., Hernáez, I., Luengo, I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Transactions On Audio, Speech, And Language Processing 14(4) (July 2006)Google Scholar
  6. 6.
    MPEG-4: ISO/IEC 14496 standard (1999),
  7. 7.
    Esposito, A.: The Perceptual and Cognitive Role of Visual and Auditory Channels in Conveying Emotional Information. In: Cogn. Comput. Springer Science+Business Media, LLC (2009), doi:10.1007/s12559-009-9017-8Google Scholar
  8. 8.
    Campbell, N: Individual Traits of Speaking Style and Speech Rhythm in a Spoken Discourse. COST Action 2102 International Conference on Verbal and Nonverbal Features.... Patras, Greece, October 2007, pp. 107–120. (2007)Google Scholar
  9. 9.
    Kostoulas, T., Ganchev, T., Fakotakis, N.: Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data, COST Action 2102. In: International Conference on Verbal and Nonverbal Features, Patras, Greece, pp. 235–242 (October 2007)Google Scholar
  10. 10.
    Tóth, S.L., Sztahó, D., Vicsi, K.: Speech Emotion Perception by Human and Machine. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 213–224. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Vicsi, K., Szaszák, G.y.: Using Prosody for the Improvement of ASR: Sentence Modality Recognition. In: Interspeech 2008, Brisbane, Ausztrália (2008), ISCA Archive, (September 23-26, 2008)
  12. 12.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, vol. 2(12), pp. 1137–1143 (1995)Google Scholar
  13. 13.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Klára Vicsi
    • 1
  • Dávid Sztahó
    • 1
  1. 1.Laboratory of Speech Acoustics, Department of Telecommunication and Media InformaticsBudapest University of Technology and EconomicsBudapestHungary

Personalised recommendations