On Annotation and Evaluation of Multi-modal Corpora in Affective Human-Computer Interaction

Kächele, Markus; Schels, Martin; Meudt, Sascha; Kessler, Viktor; Glodek, Michael; Thiam, Patrick; Tschechne, Stephan; Palm, Günther; Schwenker, Friedhelm

doi:10.1007/978-3-319-15557-9_4

Markus Kächele⁸,
Martin Schels⁸,
Sascha Meudt⁸,
Viktor Kessler⁸,
Michael Glodek⁸,
Patrick Thiam⁸,
Stephan Tschechne⁸,
Günther Palm⁸ &
…
Friedhelm Schwenker⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8757))

Included in the following conference series:

International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction

970 Accesses
3 Citations

Abstract

In this paper, we discuss the topic of affective human-computer interaction from a data driven viewpoint. This comprises the collection of respective databases with emotional contents, feasible annotation procedures and software tools that are able to conduct a suitable labeling process. A further issue that is discussed in this paper is the evaluation of the results that are computed using statistical classifiers. Based on this we propose to use fuzzy memberships in order to model affective user state and endorse respective fuzzy performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Böck, R., Siegert, I., Haase, M., Lange, J., Wendemuth, A.: ikannotate – a tool for labelling, transcription, and annotation of emotionally coloured speech. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part I. LNCS, vol. 6974, pp. 25–34. Springer, Heidelberg (2011)
Chapter Google Scholar
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: ‘FEELTRACE’: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 19–24 (2000)
Google Scholar
Dubois, D., Prade, H.: Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York (1980)
MATH Google Scholar
Glodek, M., Schels, M., Schwenker, F., Palm, G.: Combination of sequential class distributions from multiple channels using markov fusion networks. J. Multimodal User Interfaces 8, 257–272 (2014)
Article Google Scholar
Kächele, M., Glodek, M., Zharkov, D., Meudt, S., Schwenker, F.: Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In: Proceedings of ICPRAM, pp. 671–678 (2014)
Google Scholar
Kächele, M., Schels, M., Schwenker, F.: Inferring depression and affect from application dependent meta knowledge. In: Proceedings of MM. ACM (2014). http://dx.doi.org/10.1145/2661806.2661813
Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of ICPR (2014, to appear)
Google Scholar
Kächele, M., Thiam, P., Palm, G., Schwenker, F.: Majority-class aware support vector domain oversampling for imbalanced classification problems. In: El Gayar, N., Schwenker, F., Suen, C. (eds.) ANNPR 2014. LNCS, vol. 8774, pp. 83–92. Springer, Heidelberg (2014)
Chapter Google Scholar
Kächele, M., Zharkov, D., Meudt, S., Schwenker, F.: Prosodic, spectral and voice quality feature selection using a long-term stopping criterion for audio-based emotion recognition. In: Proceedings of ICPR (2014, to appear)
Google Scholar
Kim, J., André, E.: Emotion recognition based on physiological changes in music listening. IEEE Trans. Pattern Anal. Machine Intell. 30(12), 2067–2083 (2008)
Article Google Scholar
Kipp, M.: Anvil - a generic annotation tool for multimodal dialogue. In: Proceedings of 7th European Conference on Speech Communication and Technology (Eurospeech), pp. 1367–1370 (2001)
Google Scholar
Meudt, S., Bigalke, L., Schwenker, F.: ATLAS - an annotation tool for HCI data utilizing machine learning methods. In: Proceedings of the 1st International Conference on Affective and Pleasurable Design, pp. 5347–5352 (2012)
Google Scholar
Meudt, S., Zharkov, D., Kächele, M., Schwenker, F.: Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech. In: Proceedings of ICMI, pp. 551–556 (2013)
Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Eaglewood Cliffs (1993)
Google Scholar
Rösner, D., Frommer, J., Friesen, R., Haase, M., Lange, J., Otto, M.: LAST MINUTE: a multimodal corpus of speech-based user-companion interactions. In: Proceedings of LREC, pp. 2559–2566 (2012)
Google Scholar
Schels, M., Glodek, M., Meudt, S., Scherer, S., Schmidt, M., Layher, G., Tschechne, S., Brosch, T., Hrabal, D., Walter, S., Traue, H., Palm, G., Neumann, H., Schwenker, F.: Multi-modal classifier-fusion for the recognition of emotions. In: Rojc, M., Campbell, N. (eds.) Coverbal Synchrony in Human-Machine Interaction, pp. 73–98. CRC Press, Boca Raton (2013)
Chapter Google Scholar
Schels, M., Glodek, M., Meudt, S., Schmidt, M., Hrabal, D., Böck, R., Walter, S., Schwenker, F.: Multi-modal classifier-fusion for the classification of emotional states in WOZ scenarios. In: Proceedings of 1st International Conference on Affective and Pleasurable Design, pp. 5337–5346 (2012)
Google Scholar
Schels, M., Glodek, M., Palm, G., Schwenker, F.: Revisiting AVEC 2011 – an information fusion architecture. In: Apolloni, B., Bassis, S., Esposito, A., Morabito, F.C. (eds.) Neural Nets and Surroundings. SIST, vol. 19, pp. 385–393. Springer, Heidelberg (2013)
Chapter Google Scholar
Schels, M., Kächele, M., Glodek, M., Hrabal, D., Walter, S., Schwenker, F.: Using unlabeled data to improve classification of emotional states in human computer interaction. J. Multimodal User Interfaces 8(1), 5–16 (2014)
Article Google Scholar
Schels, M., Kächele, M., Hrabal, D., Walter, S., Traue, H.C., Schwenker, F.: Classification of emotional states in a Woz scenario exploiting labeled and unlabeled bio-physiological data. In: Schwenker, F., Trentin, E. (eds.) PSL 2011. LNCS, vol. 7081, pp. 138–147. Springer, Heidelberg (2012)
Chapter Google Scholar
Schels, M., Schwenker, F.: A multiple classifier system approach for facial expressions in image sequences utilizing GMM supervectors. In: Proceedings of ICPR, pp. 4251–4254. IEEE (2010)
Google Scholar
Scherer, K.R., Johnstone, T., Klasmeyer, G.: Affective science. In: Davidson, R.J., Scherer, K.R., Goldsmith, H.H. (eds.) Handbook of Affective Sciences - Vocal expression of Emotion, pp. 433–456. Oxford University Press, New York (2003)
Google Scholar
Scherer, S., Glodek, M., Layher, G., Schels, M., Schmidt, M., Brosch, T., Tschechne, S., Schwenker, F., Neumann, H., Palm, G.: A generic framework for the inference of user states in human computer interaction: how patterns of low level communicational cues support complex affective states. JMUI 6(3–4), 117–141 (2012)
Google Scholar
Scherer, S., Kane, J., Gobl, C., Schwenker, F.: Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification. Comput. Speech Lang. 27(1), 263–287 (2012)
Article Google Scholar
Scherer, S., Schels, M., Palm, G.: How low level observations can help to reveal the user’s state in HCI. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 81–90. Springer, Heidelberg (2011)
Chapter Google Scholar
Scherer, S., Siegert, I., Bigalke, L., Meudt, S.: Developing an expressive speech labeling tool incorporating the temporal characteristics of emotion. In: Proceedings of LREC, pp. 1172–1175 (2010)
Google Scholar
Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: AVEC 2011–the first international audio/visual emotion challenge. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 415–424. Springer, Heidelberg (2011)
Chapter Google Scholar
Schüssel, F., Honold, F., Schmidt, M., Bubalo, N., Huckauf, A., Weber, M.: Multimodal interaction history and its use in error detection and recovery. In: Proceedings of ICMI. ACM (2014, to appear)
Google Scholar
Schwenker, F., Frey, M., Glodek, M., Kächele, M., Meudt, S., Schels, M., Schmidt, M.: A new multi-class fuzzy support vector machine algorithm. In: El Gayar, N., Schwenker, F., Suen, C. (eds.) ANNPR 2014. LNCS, vol. 8774, pp. 153–164. Springer, Heidelberg (2014)
Chapter Google Scholar
Strauß, P.M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Schwenker, F., Traue, H., Walter, W., Weidenbacher, U.: Wizard-of-oz data collection for perception and interaction in multi-user environments. In: Proceedings of LREC, pp. 2014–2017 (2006)
Google Scholar
Thiel, C., Scherer, S., Schwenker, F.: Fuzzy-input fuzzy-output one-against-all support vector machines. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part III. LNCS (LNAI), vol. 4694, pp. 156–165. Springer, Heidelberg (2007)
Chapter Google Scholar
Torralba, A., Russell, B., Yuen, J.: Labelme: online image annotation and applications. Proc. IEEE 98(8), 1467–1484 (2010)
Article Google Scholar
Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R., Pantic, M.: AVEC 2014: 3D dimensional affect and depression recognition challenge. In: Proceedings of ACM Multimedia 2014. ACM (2014)
Google Scholar
Walter, S., Kim, J., Hrabal, D., Crawcour, S., Kessler, H., Traue, H.: Transsituational individual-specific biopsychological classification of emotions. IEEE Trans. Syst. Man Cybern. 43(4), 988–995 (2013)
Article Google Scholar
Walter, S., Scherer, S., Schels, M., Glodek, M., Hrabal, D., Schmidt, M., Böck, R., Limbrecht, K., Traue, H.C., Schwenker, F.: Multimodal emotion classification in naturalistic user behavior. In: Jacko, J.A. (ed.) Human-Computer Interaction, Part III, HCII 2011. LNCS, vol. 6763, pp. 603–611. Springer, Heidelberg (2011)
Google Scholar
Wöllmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)
Article Google Scholar

Download references

Acknowledgements

This paper is based on work done within the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation (DFG). The work of Markus Kächele is supported by a scholarship of the Landesgraduiertenförderung Baden-Württemberg at Ulm University.

Author information

Authors and Affiliations

Institute of Neural Information Processing, Ulm University, James-Franck-Ring, 89081, Ulm, Germany
Markus Kächele, Martin Schels, Sascha Meudt, Viktor Kessler, Michael Glodek, Patrick Thiam, Stephan Tschechne, Günther Palm & Friedhelm Schwenker

Authors

Markus Kächele
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schels
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Meudt
View author publications
You can also search for this author in PubMed Google Scholar
Viktor Kessler
View author publications
You can also search for this author in PubMed Google Scholar
Michael Glodek
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Thiam
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Tschechne
View author publications
You can also search for this author in PubMed Google Scholar
Günther Palm
View author publications
You can also search for this author in PubMed Google Scholar
Friedhelm Schwenker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markus Kächele .

Editor information

Editors and Affiliations

Otto von Guericke University, Magdeburg, Germany
Ronald Böck
Trinity College, Dublin, Ireland
Francesca Bonin
Trinity College, Dublin, Ireland
Nick Campbell
Utrecht University, Utrecht, The Netherlands
Ronald Poppe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kächele, M. et al. (2015). On Annotation and Evaluation of Multi-modal Corpora in Affective Human-Computer Interaction. In: Böck, R., Bonin, F., Campbell, N., Poppe, R. (eds) Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. MA3HMI 2014. Lecture Notes in Computer Science(), vol 8757. Springer, Cham. https://doi.org/10.1007/978-3-319-15557-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-15557-9_4
Published: 12 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15556-2
Online ISBN: 978-3-319-15557-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics