Studying Self- and Active-Training Methods for Multi-feature Set Emotion Recognition

  • José Esparza
  • Stefan Scherer
  • Friedhelm Schwenker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7081)


Automatic emotion classification is a task that has been subject of study from very different approaches. Previous research proves that similar performance to humans can be achieved by adequate combination of modalities and features. Nevertheless, large amounts of training data seem necessary to reach a similar level of accurate automatic classification. The labelling of training, validation and test sets is generally a difficult and time consuming task that restricts the experiments. Therefore, in this work we aim at studying self and active training methods and their performance in the task of emotion classification from speech data to reduce annotation costs. The results are compared, using confusion matrices, with the human perception capabilities and supervised training experiments, yielding similar accuracies.


Human perception of emotion automatic emotion classification semi-supervised learning active learning emotion recognition from speech 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bicego, M., Murino, V., Figueiredo, M.: Similarity-Based Clustering of Sequences using Hidden Markov Models. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 95–104. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  2. 2.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT 1998, pp. 92–100. ACM, New York (1998)CrossRefGoogle Scholar
  3. 3.
    Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 595–602. ACM, New York (2008)Google Scholar
  4. 4.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)zbMATHGoogle Scholar
  5. 5.
    Hermansky, H.: The modulation spectrum in automatic recognition of speech. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 140–147. IEEE (1997)Google Scholar
  6. 6.
    Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, special issue on Robust Speech Recognition 2, 578–589 (1994)CrossRefGoogle Scholar
  7. 7.
    Li, D., Sethi, I.K., Dimitrova, N., McGee, T.: Classification of general audio data for content-based retrieval. Pattern Recognition Letters 22(5), 533–544 (2001)CrossRefzbMATHGoogle Scholar
  8. 8.
    Lomasky, R., Brodley, C.E., Aernecke, M., Walt, D., Friedl, M.: Active Class Selection. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 640–647. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Maganti, H.K., Scherer, S., Palm, G.: A Novel Feature for Emotion Recognition in Voice Based Applications. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 710–711. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Monteleoni.: Learning with Online Constraints: Shifting Concepts and Active Learning. PhD thesis, Massachusetts Institute of Technology (2006)Google Scholar
  11. 11.
    Rabiner, L.R.: Fundamentals of Speech Recognition. Prentice-Hall (1993)Google Scholar
  12. 12.
    Scherer, K.R., Johnstone, T., Klasmeyer, G.: Affective Science. In: Handbook of Affective Sciences - Vocal expression of emotion, ch. 23, pp. 433–456. Oxford University Press (2003)Google Scholar
  13. 13.
    Scherer, S.: Analyzing the User’s State in HCI: From Crisp Emotions to Conversational Dispositions. PhD thesis. Ulm University (2011)Google Scholar
  14. 14.
    Settles.: Curious Machines: Active Learning with Structured Instances. PhD thesis, University of Wisconsin Madison (2008)Google Scholar
  15. 15.
    Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648. University of Wisconsin–Madison (2009)Google Scholar
  16. 16.
    Thiel, C., Scherer, S., Schwenker, F.: Fuzzy-Input Fuzzy-Output One-against-all Support Vector Machines. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part III. LNCS (LNAI), vol. 4694, pp. 156–165. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Tong.: Active Learning: Theory and Applications. PhD thesis. Stanford University (2001)Google Scholar
  18. 18.
    Wendt, B.: Analysen Emotionaler Prosodie, Hallesche Schriften zur Sprechwissenschaft und Phonetik, vol. 20. Peter Lang Internationaler Verlag der Wissenschaften (2007)Google Scholar
  19. 19.
    Wendt, B., Scheich, H.: The ”Magdeburger Prosodie Korpus” - a spoken language corpus for fMRI-studies. In: Speech Prosody SProSIG 2002, pp. 699–701 (2002)Google Scholar
  20. 20.
    Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences. University of Wisconsin-Madison (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • José Esparza
    • 1
  • Stefan Scherer
    • 2
  • Friedhelm Schwenker
    • 1
  1. 1.Institute of Neural Information ProcessingUniversity of UlmGermany
  2. 2.School of Linguistic, Speech and Communication SciencesTrinity CollegeDublinIreland

Personalised recommendations