Multimodal Affect Recognition in the Context of Human-Computer Interaction for Companion-Systems

Schwenker, Friedhelm; Böck, Ronald; Schels, Martin; Meudt, Sascha; Siegert, Ingo; Glodek, Michael; Kächele, Markus; Schmidt-Wack, Miriam; Thiam, Patrick; Wendemuth, Andreas; Krell, Gerald

doi:10.1007/978-3-319-43665-4_19

Friedhelm Schwenker⁴,
Ronald Böck⁵,
Martin Schels⁴,
Sascha Meudt⁴,
Ingo Siegert⁵,
Michael Glodek⁴,
Markus Kächele⁴,
Miriam Schmidt-Wack⁴,
Patrick Thiam⁴,
Andreas Wendemuth^5,6 &
…
Gerald Krell⁷

Part of the book series: Cognitive Technologies ((COGTECH))

875 Accesses
2 Citations

Abstract

In general, humans interact with each other using multiple modalities. The main channels are speech, facial expressions, and gesture. But also bio-physiological data such as biopotentials can convey valuable information which can be used to interpret the communication in a dedicated way. A Companion-System can use these modalities to perform an efficient human-computer interaction (HCI). To do so, the multiple sources need to be analyzed and combined in technical systems. However, so far only few studies have been published dealing with the fusion of three or even more such modalities. This chapter addresses the necessary processing steps in the development of a multimodal system applying fusion approaches.

ATLAS and ikannotate are presented which are designed for the pre-analyzing of multimodal data streams and the labeling of relevant parts. ATLAS allows us to display raw data, extracted features and even outputs of pre-trained classifier modules. Further, the tool integrates annotation, transcription and an active learning module. Ikannotate can be directly used for transcription and guided step-wise emotional annotation of multimodal data. The tool includes the three mainly used annotation paradigms, namely the basic emotions, the Geneva emotion wheel and the self-assessment manikins (SAMs). Furthermore, annotators using ikannotate can assign an uncertainty to samples.

Classifier architectures need to realize a fusion system in which the multiple modalities are combined. A large number of machine learning approaches were evaluated, such as data, feature, score and decision-level fusion schemes, but also temporal fusion architectures and partially supervised learning.

The proposed methods are evaluated on either multimodal benchmark corpora or on the datasets of the Transregional Collaborative Research Centre SFB/TRR 62, i.e. Last Minute Corpus and the EmoRec Dataset. Furthermore, we present results which were achieved in international challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The g-mean was chosen because of the strong imbalance between the two classes.

References

Batliner, A., Fischer, K., Huber, R., Spiker, J., Nöth, E.: Desperately seeking emotions: Actors, wizards and human beings. In: Proceedings of the ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, pp. 195–200 (2000)
Google Scholar
Böck, R., Siegert, I., Haase, M., Lange, J., Wendemuth, A.: ikannotate - a tool for labelling, transcription, and annotation of emotionally coloured speech. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.C. (eds.) Proceedings of ACII. Lecture Notes on Computer Science, vol. 6974, pp. 25–34. Springer, Berlin (2011)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of Interspeech 2005, pp. 1517–1520 (2005)
Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)
Article Google Scholar
Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation and machine learning based detection. Neural Netw. 18(4), 407–422 (2005)
Article Google Scholar
Dhall, A., Goecke, R., Joshi, J., Sikka, K., Gedeon, T.: Emotion recognition in the wild challenge 2014: baseline, data and protocol. In: Proceedings of ICMI, pp. 461–466. ACM, New York (2014)
Google Scholar
Dix, A., Finlay, J., Abowd, G., Beale, R.: Human-computer Interaction. Prentice-Hall, Upper Saddle River, NJ (1997)
MATH Google Scholar
Frommer, J., Michaelis, B., Rösner, D., Wendemuth, A., Friesen, R., Haase, M., Kunze, M., Andrich, R., Lange, J., Panning, A., Siegert, I.: Towards emotion and affect detection in the multimodal last minute corpus. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of LREC. ELRA, Paris (2012)
Google Scholar
Glodek, M., Tschechne, S., Layher, G., Schels, M., Brosch, T., Scherer, S., Kächele, M., Schmidt, M., Neumann, H., Palm, G., Schwenker, F.: Multiple classifier systems for the classification of audio-visual emotional states. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.C. (eds.) Proceedings of ACII - Part II, Lecture Notes on Computer Science, vol. 6975, pp. 359–368. Springer, Berlin (2011)
Google Scholar
Glodek, M., Reuter, S., Schels, M., Dietmayer, K., Schwenker, F.: Kalman filter based classifier fusion for affective state recognition. In: Zhou, Z.H., Roli, F., Kittler, J. (eds.) Multiple Classifier Systems (MCS). Lecture Notes on Computer Science, vol. 7872, pp. 85–94. Springer, Berlin (2013)
Google Scholar
Glodek, M., Schels, M., Schwenker, F.: Ensemble Gaussian mixture models for probability density estimation. Comput. Stat. 27(1), 127–138 (2013)
Article MathSciNet MATH Google Scholar
Glodek, M., Geier, T., Biundo, S., Palm, G.: A layered architecture for probabilistic complex pattern recognition to detect user preferences. J. Biol. Inspired Cognitive Archit. 9, 46–56 (2014)
Article Google Scholar
Glodek, M., Schels, M., Schwenker, F., Palm, G.: Combination of sequential class distributions from multiple channels using Markov fusion networks. J. Multimodal User Interfaces 8(3), 257–272 (2014)
Article Google Scholar
Glodek, M., Honold, F., Geier, T., Krell, G., Nothdurft, F., Reuter, S., Schüssel, F., Hörnle, T., Dietmayer, K., Minker, W., Biundo, S., Weber, M., Palm, G., Schwenker, F.: Fusion paradigms in cognitive technical systems for human-computer interaction. Neurocomputing 161, 17–37 (2015)
Article Google Scholar
Gunes, H., Piccardi, M.: Bi-modal emotion recognition from expressive face and body gestures. J. Netw. Comput. Appl. 30(4), 1334–1345 (2007)
Article Google Scholar
Healey, J.: Wearable and automotive systems for affect recognition from physiology. Ph.D. thesis, MIT (2000)
Google Scholar
Hudlicka, E.: To feel or not to feel: The role of affect in human-computer interaction. Int. J. Hum.-Comput. Stud. 59(1-2), 1–32 (2003)
Article Google Scholar
Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of ICPR, pp. 4660–4665 (2014)
Google Scholar
Kächele, M., Glodek, M., Zharkov, D., Meudt, S., Schwenker, F.: Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In: De Marsico, M., Tabbone, A., Fred, A. (eds.) Proceedings of ICPRAM, pp. 671–678. SciTePress, Setúbal (2014)
Google Scholar
Kächele, M., Schels, M., Schwenker, F.: Inferring depression and affect from application dependent meta knowledge. In: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, AVEC ’14, pp. 41–48. ACM, New York (2014)
Google Scholar
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Fluids Eng. 82(1), 35–45 (1960)
Google Scholar
Kanade, T., Cohn, J., Tian, Y.: Comprehensive database for facial expression analysis. In: Automatic Face and Gesture Recognition, 2000, pp. 46–53 (2000)
Google Scholar
Kim, K., Bang, S., Kim, S.: Emotion recognition system using short-term monitoring of physiological signals. Med. Biol. Eng. Comput. 42(3), 419–427 (2004)
Article Google Scholar
Kipp, M.: Anvil - a generic annotation tool for multimodal dialogue. In: INTERSPEECH-2001, Aalborg, Denmark, pp. 1367–1370 (2001)
Google Scholar
Krell, G., Niese, R., Al-Hamadi, A., Michaelis, B.: Suppression of uncertainties at emotional transitions — facial mimics recognition in video with 3-D model. In: Richard, P., Braz, J. (eds.) Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), vol. 2, pp. 537–542 (2010)
Google Scholar
Krell, G., Glodek, M., Panning, A., Siegert, I., Michaelis, B., Wendemuth, A., Schwenker, F.: Fusion of fragmentary classifier decisions for affective state recognition. In: MPRSS, Lecture Notes on Artificial Intelligence, vol. 7742, pp. 116–130. Springer, Berlin (2012)
Google Scholar
Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, New York (2004)
Book MATH Google Scholar
Lang, P.J.: Behavioral Treatment and Bio-Behavioral Assessment: Computer Applications, pp. 119–137. Ablex Publishing, New York (1980)
Google Scholar
Meudt, S., Schwenker, F.: Enhanced autocorrelation in real world emotion recognition. In: Proceedings of the 16th International Conference on Multimodal Interaction, ICMI ’14, pp. 502–507. ACM, New York (2014)
Google Scholar
Meudt, S., Bigalke, L., Schwenker, F.: Atlas – an annotation tool for HCI data utilizing machine learning methods. In: International Conference on Affective and Pleasurable Design (APD’12), pp. 5347–5352 (2012)
Google Scholar
Meudt, S., Zharkov, D., Kächele, M., Schwenker, F.: Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech. In: International Conference on Multimodal Interaction, ICMI 2013, pp. 551–556. ACM, New York (2013)
Google Scholar
Niese, R., Al-Hamadi, A., Heuer, M., Michaelis, B., Matuszewski, B.: Machine vision based recognition of emotions using the circumplex model of affect. In: Proceedings of the International Conference on Multimedia Technology (ICMT), pp. 6424–6427. IEEE, New York (2011)
Google Scholar
North, D.O.: An analysis of the factors which determine signal/noise discrimination in pulsed-carrier systems. Proc. IEEE 51(7), 1016–1027 (1963)
Article Google Scholar
Oudeyer, P.: The production and recognition of emotions in speech: features and algorithms. Int. J. Hum.-Comput. Stud. 59(1-2), 157–183 (2003)
Article Google Scholar
Palm, G., Glodek, M.: Towards emotion recognition in human computer interaction. In: Esposito, A., Squartini, S., Palm, G. (eds.) Neural Nets and Surroundings, vol. 19, pp. 323–336. Springer, Berlin (2013)
Chapter Google Scholar
Panning, A., Siegert, I., Al-Hamadi, A., Wendemuth, A., Rösner, D., Frommer, J., Krell, G., Michaelis, B.: Multimodal affect recognition in spontaneous HCI environment. In: 2012 IEEE International Conference on Signal Processing, Communication and Computing, pp. 430–435. IEEE, New York (2012)
Google Scholar
Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–8 (2013)
Google Scholar
Schels, M., Scherer, S., Glodek, M., Kestler, H., Palm, G., Schwenker, F.: On the discovery of events in EEG data utilizing information fusion. Comput. Stat. 28(1), 5–18 (2013)
Article MathSciNet MATH Google Scholar
Schels, M., Kächele, M., Glodek, M., Hrabal, D., Walter, S., Schwenker, F.: Using unlabeled data to improve classification of emotional states in human computer interaction. J. Multimodal User Interfaces 8(1), 5–16 (2014)
Article Google Scholar
Scherer, K.R.: What are emotions? and how can they be measured? Soc. Sci. Inf. 44, 695–729 (2005)
Article Google Scholar
Scherer, S., Schwenker, F., Palm, G.: Classifier fusion for emotion recognition from speech. In: Advanced Intelligent Environments, pp. 95–117. Springer, Boston (2009)
Google Scholar
Scherer, S., Glodek, M., Layher, G., Schels, M., Schmidt, M., Brosch, T., Tschechne, S., Schwenker, F., Neumann, H., Palm, G.: A generic framework for the inference of user states in human computer interaction: how patterns of low level behavioral cues support complex user states in HCI. J. Multimodal User Interfaces 6(3–4), 117–141 (2012)
Article Google Scholar
Scherer, S., Glodek, M., Schwenker, F., Campbell, N., Palm, G.: Spotting laughter in natural multiparty conversations: a comparison of automatic online and offline approaches using audiovisual data. ACM Trans. Interactive Intell. Syst. 2(1), 4:1–4:31 (2012)
Google Scholar
Schmidt, T., Schütte, W.: FOLKER: an annotation tool for efficient transcription of natural, multi-party interaction. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (2010)
Google Scholar
Schmidt, T., Wörner, K.: EXMARaLDA – Creating, analysing and sharing spoken language corpora for pragmatic research. Pragmatics 19, 565–582 (2009)
Article Google Scholar
Schölkopf, B., Williamson, R.C., Smola, A.J., Shawe-Taylor, J., Platt, J.C.: Support vector method for novelty detection. In: NIPS, vol. 12, pp. 582–588 (1999)
Google Scholar
Schüssel, F., Honold, F., Weber, M., Schmidt, M., Bubalo, N., Huckauf, A.: Multimodal interaction history and its use in error detection and recovery. In: Proceedings of the 16th ACM International Conference on Multimodal Interaction (ICMI’14), pp. 164–171. ACM, New York (2014)
Google Scholar
Schwenker, F., Scherer, S., Magdi, Y.M., Palm, G.: The GMM-SVM supervector approach for the recognition of the emotional status from speech. In: ICANN (1), Lecture Notes on Computer Science, vol. 5768, pp. 894–903. Springer, Berlin (2009)
Google Scholar
Schwenker, F., Scherer, S., Schmidt, M., Schels, M., Glodek, M.: Multiple classifier systems for the recognition of human emotions. In: Multiple Classifier Systems, Lecture Notes on Computer Science, vol. 5997, pp. 315–324. Springer, Berlin (2010)
Google Scholar
Sezgin, M.C., Gunsel, B., Kurt, G.: Perceptual audio features for emotion detection. EURASIP J. Audio Speech Music Process. 2012, 1–21 (2012)
Article Google Scholar
Siegert, I., Glodek, M., Krell, G.: Using speaker group dependent modelling to improve fusion of fragmentary classifier decisions. In: Proceedings of the International IEEE Conference on Cybernetics (CYBCONF), pp. 132–137. IEEE, New York (2013)
Google Scholar
Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3, 42–55 (2012).
Article Google Scholar
Strauß, P.M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Schwenker, F., Traue, H., Walter, W., Weidenbacher, U.: Wizard-of-oz data collection for perception and interaction in multi-user environments. In: Proceedings of LREC, pp. 2014–2017 (2006)
Google Scholar
Traue, H.C., Ohl, F., Brechmann, A., Schwenker, F., Kessler, H., Limbrecht, K., Hoffman, H., Scherer, S., Kotzyba, M., Scheck, A., Walter, S.: A framework for emotions and dispositions in man-companion interaction. In: Rojc, M., Campbell, N. (eds.) Converbal Synchrony in Human-Machine Interaction, pp. 98–140. CRC Press, Boca Raton (2013)
Google Scholar
Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R., Pantic, M.: AVEC 2014: 3d dimensional affect and depression recognition challenge. In: Proceedings of ACM MM, AVEC ’14, pp. 3–10. ACM, New York (2014)
Google Scholar
Vinciarelli, A., Pantic, M., Bourlard, H., Pentland, A.: Social signal processing: state-of-the-art and future perspectives of an emerging domain. In: Proceedings of the International ACM Conference on Multimedia (MM), pp. 1061–1070. ACM, New York, NY (2008)
Google Scholar
Walter, S., Scherer, S., Schels, M., Glodek, M., Hrabal, D., Schmidt, M., Böck, R., Limbrecht, K., Traue, H.C., Schwenker, F.: Multimodal emotion classification in naturalistic user behavior. In: Jacko, J.A. (ed.) Proceedings of the 14th International Conference on Human Computer Interaction (HCI’11), Lecture Notes on Computer Science, vol. 6763, pp. 603–611. Springer, Berlin (2011)
Google Scholar
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)
Article Google Scholar
Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)
Article Google Scholar

Download references

Acknowledgements

We thank our highly regarded deceased colleague and friend Prof. Dr. Bernd Michaelis who contributed to the SFB on various topics and provided well-informed suggestions. This work was done within the Transregional Collaborative Research Centre SFB/TRR 62 “Companion-Technology for Cognitive Technical Systems” funded by the German Research Foundation (DFG).

Author information

Authors and Affiliations

Institute for Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Friedhelm Schwenker, Martin Schels, Sascha Meudt, Michael Glodek, Markus Kächele, Miriam Schmidt-Wack & Patrick Thiam
Cognitive Systems Group, Institute for Information Technology and Communications, Otto von Guericke University, PO Box 4120, 39106, Magdeburg, Germany
Ronald Böck, Ingo Siegert & Andreas Wendemuth
Center for Behavioral Brain Sciences, 39118, Magdeburg, Germany
Andreas Wendemuth
Technical Computer Science Group, Institute for Information Technology and Communications, Otto von Guericke University, PO Box 4120, 39106, Magdeburg, Germany
Gerald Krell

Authors

Friedhelm Schwenker
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Böck
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schels
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Meudt
View author publications
You can also search for this author in PubMed Google Scholar
Ingo Siegert
View author publications
You can also search for this author in PubMed Google Scholar
Michael Glodek
View author publications
You can also search for this author in PubMed Google Scholar
Markus Kächele
View author publications
You can also search for this author in PubMed Google Scholar
Miriam Schmidt-Wack
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Thiam
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Wendemuth
View author publications
You can also search for this author in PubMed Google Scholar
Gerald Krell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Friedhelm Schwenker .

Editor information

Editors and Affiliations

Institute of Artificial Intelligence, Universität Ulm, Ulm, Germany
Susanne Biundo
Cognitive Systems Group, Institute for Information Technology and Communications (IIKT) and Center for Behavioral Brain Sciences (CBBS), Otto-von-Guericke Universität Magdeburg, Magdeburg, Germany
Andreas Wendemuth

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schwenker, F. et al. (2017). Multimodal Affect Recognition in the Context of Human-Computer Interaction for Companion-Systems. In: Biundo, S., Wendemuth, A. (eds) Companion Technology. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-43665-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-43665-4_19
Published: 05 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43664-7
Online ISBN: 978-3-319-43665-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics