Problems of the Automatic Emotion Recognitions in Spontaneous Speech; An Example for the Recognition in a Dispatcher Center
Numerous difficulties, in the examination of emotions occurring in continuous spontaneous speech, are discussed in this paper, than different emotion recognition experiments are presented, using clauses as the recognition unit. In a testing experiment it was examined that what kind of acoustical features are the most important for the characterization of emotions, using spontaneous speech database. An SVM classifier was built for the classification of 4 most frequent emotions. It was found that fundamental frequency, energy, and its dynamics in a clause are the main characteristic parameters for the emotions, and the average spectral information, as MFCC and harmonicity are also very important. In a real life experiment automatic recognition system was prepared for a telecommunication call center. Summing up the results of these experiments, we can say, that clauses can be an optimal unit of the recognition of emotions in continuous speech.
Keywordsspeech emotion recognition telephone speech prosodic recognizer speech emotion database
Unable to display preview. Download preview PDF.
- 1.Campbell, N.: Getting to the heart of the matter. Keynote Speech. In: Proc. Language resources and Evaluation Conference (LREC 2004), Lisabon, Portugal (2004)Google Scholar
- 4.Burkhardt, F., Paeschke, A., et al.: A database of German Emotional Speech. In: N:Proc. of Interspeech 2005, pp. 1517–1520 (2005)Google Scholar
- 5.Navas, E., Hernáez, I., Luengo, I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Transactions On Audio, Speech, And Language Processing 14(4) (July 2006)Google Scholar
- 6.MPEG-4: ISO/IEC 14496 standard (1999), http://www.iec.ch
- 7.Esposito, A.: The Perceptual and Cognitive Role of Visual and Auditory Channels in Conveying Emotional Information. In: Cogn. Comput. Springer Science+Business Media, LLC (2009), doi:10.1007/s12559-009-9017-8Google Scholar
- 8.Campbell, N: Individual Traits of Speaking Style and Speech Rhythm in a Spoken Discourse. COST Action 2102 International Conference on Verbal and Nonverbal Features.... Patras, Greece, October 2007, pp. 107–120. (2007)Google Scholar
- 9.Kostoulas, T., Ganchev, T., Fakotakis, N.: Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data, COST Action 2102. In: International Conference on Verbal and Nonverbal Features, Patras, Greece, pp. 235–242 (October 2007)Google Scholar
- 11.Vicsi, K., Szaszák, G.y.: Using Prosody for the Improvement of ASR: Sentence Modality Recognition. In: Interspeech 2008, Brisbane, Ausztrália (2008), ISCA Archive, http://www.isca-speech.org/archive (September 23-26, 2008)
- 12.Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, vol. 2(12), pp. 1137–1143 (1995)Google Scholar
- 13.Transcriber, http://trans.sourceforge.net/en/presentation.php