Skip to main content

Emotion Classification of Audio Signals Using Ensemble of Support Vector Machines

  • Conference paper
Perception in Multimodal Dialogue Systems (PIT 2008)

Abstract

This study presents an approach for emotion classification of speech utterances based on ensemble of support vector machines. We considered feature level fusion of the MFCC, total energy and F0 as input feature vectors, and choose bagging method for the classification. Additionally, we also present a new emotional dataset based on a popular animation film, Finding Nemo where emotions are much emphasized to attract attention of spectators. Speech utterances are directly extracted from video audio channel including all background noise. Totally 2054 utterances from 24 speakers were annotated by a group of volunteers based on seven emotion categories. We concentrated on perceived emotion. Our approach has been tested on our newly developed dataset besides publically available datasets of DES and EmoDB. Experiments showed that our approach achieved 77.5% and 66.8% overall accuracy for four and five class classification on EFN dataset respectively. In addition, we achieved 67.6% accuracy on DES (five classes) and 63.5% on EmoDB (seven classes) dataset using ensemble of SVM’s with 10 fold cross-validation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Altun, H., Polat, G.: New Frameworks to Boost Feature Selection Algorithms in Emotion Detection for Improved Human-Computer Interaction. In: Mele, F., Ramella, G., Santillo, S., Ventriglia, F. (eds.) BVAI 2007. LNCS, vol. 4729, pp. 533–541. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  • Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Proc. INTERSPEECH 2005, ISCA, Lisbon, Portugal, pp. 1517–1520 (2005)

    Google Scholar 

  • Danisman, T., Alpkocak, A.: Speech vs. Nonspeech Segmentation of Audio Signals Using Support Vector Machines. In: Signal Processing and Communication Applications Conference, Eskisehir, Turkey (2007)

    Google Scholar 

  • Datcu, D., Rothkrantz, L.J.M.: Facial expression recognition with Relevance Vector Machines. In: IEEE International Conference on Multimedia & Expo (ICME 2005) (2005) ISBN 0-7803-9332-5

    Google Scholar 

  • Engberg, I.S., Hansen, A.V.: Documentation of the Danish Emotional Speech Database (DES). Internal AAU report, Center for Person Kommunikation, Denmark (1996)

    Google Scholar 

  • Fujisaki, H., Hirose, K.: Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustical Society of Japan 5(4), 233–242 (1984)

    Google Scholar 

  • Hammal, Z., Bozkurt, B., Couvreur, L., Unay, U., Caplier, A., Dutoit, T.: Passive versus active: Vocal classification system. In: Proc. XIII European Signal Processing Conf., Antalya, Turkey (2005)

    Google Scholar 

  • Joachims, T.: Making Large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, MIT-Press, Cambridge (1999)

    Google Scholar 

  • Le, X.H., Quenot, G., Castelli, E.: Speaker-Dependent Emotion Recognition for Audio Document Indexing. In: International Conference on Electronics, Information, and Communications (ICEIC 2004) (2004)

    Google Scholar 

  • Lugger, M., Yang, B.: An Incremental Analysis of Different Feature Groups In Speaker Independent Emotion Recognition. In: 16th Int. Congress of Phonetic Sciences (2007)

    Google Scholar 

  • Lugger, M., Yang, B.: Classification of different speaking groups by means of voice quality parameters. ITG-Sprach-Kommunikation (2006)

    Google Scholar 

  • Pampalk, E.: A Matlab Toolbox to Compute Music Similarity from Audio. In: Proc. of the 5th Int. Conferance on Music Information Retrieval, pp. 254–257 (2004)

    Google Scholar 

  • Pasechke, A., Sendlmeier, W.F.: Prosodic Characteristics of Emotional Speech: Measurements of Fundamental Frequency Movements. In: Proceedings of ISCA Workshop on Speech and Emotion, Northern Ireland, pp. 75–80 (2000)

    Google Scholar 

  • Sedaaghi, M.H., Kotropoulos, C., Ververidis, D.: Using Adaptive Genetic Algorithms to Improve Speech Emotion Recognition. In: IEEE 9th Workshop on Multimedia Signal Processing, MMSP 2007, pp. 461–464 (2007)

    Google Scholar 

  • Shami, M., Verhelst, W.: An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication 49(3), 201–212 (2007)

    Article  Google Scholar 

  • Teodorescu, H.-N., Feraru, S.M.: A Study on Speech with Manifest Emotions. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 254–261. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  • Ververidis, D., Kotropoulos, C., Pitas, I.: Automatic Emotional Speech Classification. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, pp. 593–596 (2004)

    Google Scholar 

  • Ververidis, D., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Communication 48(9), 1162–1181 (2006)

    Article  Google Scholar 

  • Zervas, P., Mporas, I., Fakotakis, N., Kokkinakis, G.: Employing Fujisaki’s Intonation Model Parameters for Emotion Recognition. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) SETN 2006. LNCS (LNAI), vol. 3955, pp. 443–453. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  • Zhongzhe, X., Dellandrea, E., Dou, W., Chen, L.: Two-stage Classification of Emotional Speech. In: Int. Conference on Digital Telecommunications, p. 32 (2006)

    Google Scholar 

  • Zhou, Z.H., Wu, J., Tang, W.: Ensembling Neural Networks: Many Could Be Better Than All. Artificial Intelligence 137(1-2), 239–263 (2002)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Elisabeth André Laila Dybkjær Wolfgang Minker Heiko Neumann Roberto Pieraccini Michael Weber

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Danisman, T., Alpkocak, A. (2008). Emotion Classification of Audio Signals Using Ensemble of Support Vector Machines. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds) Perception in Multimodal Dialogue Systems. PIT 2008. Lecture Notes in Computer Science(), vol 5078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69369-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69369-7_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69368-0

  • Online ISBN: 978-3-540-69369-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics