Improved Multimodal Emotion Recognition for Better Game-Based Learning

  • Kiavash BahreiniEmail author
  • Rob Nadolski
  • Wim Westera
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9221)


This paper introduces the integration of the face emotion recognition part and the voice emotion recognition part of our FILTWAM framework that uses webcams and microphones. This framework enables real-time multimodal emotion recognition of learners during game-based learning for triggering feedback towards improved learning. The main goal of this study is to validate the integration of webcam and microphone data for a real-time and adequate interpretation of facial and vocal expressions into emotional states where the software modules are calibrated with end users. This integration aims to improve timely and relevant feedback, which is expected to increase learners’ awareness of their own behavior. Twelve test persons received the same computer-based tasks in which they were requested to mimic specific facial and vocal expressions. Each test person mimicked 80 emotions, which led to a dataset of 960 emotions. All sessions were recorded on video. An overall accuracy of Kappa value based on the requested emotions, expert opinions, and the recognized emotions is 0.61, of the face emotion recognition software is 0.76, and of the voice emotion recognition software is 0.58. A multimodal fusion between the software modules can increase the accuracy to 78 %. In contrast with existing software our software modules allow real-time, continuously and unobtrusively monitoring of learners’ face expressions and voice intonations and convert these into emotional states. This inclusion of learner’s emotional states paves the way for more effective, efficient and enjoyable game-based learning.


Game-based learning Human-computer interaction Multimodal emotion recognition Real-time emotion recognition Affective computing Webcam Microphone 



We thank our colleagues at Welten Institute of the Open University Netherlands who participated in the integration of the face and voice emotion recognition study. We likewise thank the two raters who helped us to rate the recorded streams. We also thank the Netherlands Laboratory for Lifelong Learning (NELLL) of the Open University Netherlands that sponsors this research.


  1. 1.
    Anaraki, F.: Developing an effective and efficient eLearning platform. Int. J. Comput. Internet Manage. 12(2), 57–63 (2004)Google Scholar
  2. 2.
    Hrastinski, S.: Asynchronous and synchronous e-learning. Educause Q. 31(4), 51–55 (2008)Google Scholar
  3. 3.
    Pekrun, R.: The impact of emotions on learning and achievement: towards a theory of cognitive/motivational mediators. J. Appl. Psychol. 41, 359–376 (1992)CrossRefGoogle Scholar
  4. 4.
    Bahreini, K., Nadolski, R., Qi, W., Westera, W.: FILTWAM - a framework for online game-based communication skills training - using webcams and microphones for enhancing learner support. In: Felicia, P. (ed.) The 6th European Conference on Games Based Learning (ECGBL), pp. 39–48. Ireland, Cork (2012)Google Scholar
  5. 5.
    Bahreini, K., Nadolski, R., Westera, W.: FILTWAM and voice emotion recognition. In: De Gloria, A. (ed.) GALA 2013. LNCS, vol. 8605, pp. 116–129. Springer, Heidelberg (2014)Google Scholar
  6. 6.
    Bahreini, K., Nadolski, R., Westera, W.: FILTWAM - a framework for online affective computing in serious games. In: The 4th International Conference on Games and Virtual Worlds for Serious Applications (VS-GAMES 2012). Procedia Computer Science. Genoa, Italy. vol. 15:45–52 (2012)Google Scholar
  7. 7.
    Kelle, S., Sigurðarson, S., Westera, W., Specht, M.: Game-based life-long learning. In: Magoulas, G.D. (ed.) E-Infrastructures and Technologies for Lifelong Learning: Next Generation Environments, pp. 337–349. IGI Global, Hershey, PA (2011)CrossRefGoogle Scholar
  8. 8.
    Reeves, B., Read, J.L.: Total Engagement: Using Games and Virtual Worlds to Change the Way People Work and Business Compete. Harvard Business Press, Boston (2009)Google Scholar
  9. 9.
    Gee, J.P.: What Video Games have to Teach us about Learning and Literacy. Palgrave Macmillan, New York (2003)Google Scholar
  10. 10.
    Connolly, T.M., Boyle, E.A., MacArthur, E., Hainey, T., Boyle, J.M.: A systematic literature review of empirical evidence on computer games and serious games. Comput. Educ. 59(2), 661–686 (2012)CrossRefGoogle Scholar
  11. 11.
    Van Merrienboer, J.J.G., Kirschner, P.A.: Ten Steps to Complex Learning. A systematic approach to four-component instructional design. Routledge, New York (2007)Google Scholar
  12. 12.
    Hager, P.J., Hager, P., Halliday, J.: Recovering Informal Learning: Wisdom. Judgment and Community. Springer, Dordrecht (2006)Google Scholar
  13. 13.
    Nadolski, R.J., Hummel, H.G.K., Van den Brink, H.J., Hoefakker, R., Slootmaker, A., Kurvers, H., Storm, J.: EMERGO: methodology and toolkit for efficient development of serious games in higher education. Simul. Gaming 39(3), 338–352 (2008)CrossRefGoogle Scholar
  14. 14.
    Bashyal, S., Venayagamoorthy, G.K.: Recognition of facial expressions using Gabor wavelets and learning vector quantization. Eng. Appl. Artif. Intell. 21(7), 1056–1064 (2008)CrossRefGoogle Scholar
  15. 15.
    Ekman, P., Friesen, W.V.: Facial Action Coding System: Investigator’s Guide. Consulting Psychologists Press, Palo Alto (1978)Google Scholar
  16. 16.
    Kanade, T.: Picture processing system by computer complex and recognition of human faces. Ph.D. thesis. Kyoto University, Japan (1973)Google Scholar
  17. 17.
    Petta, P., Pelachaud, C., Cowie, R.: Emotion-Oriented Systems. The Humaine Handbook. Springer-Verlag, Berlin (2011)Google Scholar
  18. 18.
    Chen, L.S.: Joint Processing of Audio-visual Information for the Recognition of Emotional Expressions in Human-computer Interaction. University of Illinois at Urbana-Champaign. Ph.D. thesis (2000)Google Scholar
  19. 19.
    Sebe, N., Cohen, I.I., Gevers, T., Huang, T.S.: Emotion recognition based on joint visual and audio cues. In: International Conference on Pattern Recognition. Hong Kong, pp. 1136–1139 (2006)Google Scholar
  20. 20.
    Song, M., Bu, J., Chen, C., Li, N.: Audio-visual based emotion recognition: a new approach. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition vol. 2 (2004)Google Scholar
  21. 21.
    Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)CrossRefGoogle Scholar
  22. 22.
    Sebe, N.: Multimodal interfaces: challenges and perspectives. J. Am. Intell. Smart Environ. 1(1), 23–30 (2009)Google Scholar
  23. 23.
    Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010). Springer-VerlagCrossRefGoogle Scholar
  24. 24.
    Saragih, J., Lucey, S., Cohn, J.: Deformable model fitting by regularized landmark mean-shifts. Int. J. Comput. Vis. (IJCV), 91(2), 200–215 (2011)Google Scholar
  25. 25.
    Lang, G., van der Molen, H.T.: Psychologische Gespreksvoering. Open University of the Netherlands, Heerlen (2008)Google Scholar
  26. 26.
    Van der Molen, H.T., Gramsbergen-Hoogland, Y.H.: Communication in Organizations: Basic Skills and Conversation Models. ISBN 978-1-84169-556-3. Psychology Press, New York (2005)Google Scholar
  27. 27.
    Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Vogt, T., André, E., Bee, N.: EmoVoice – a framework for online recognition of emotions from voice. In: Proceedings of Workshop on Perception and Interactive Technologies for Speech-Based Systems (2008)Google Scholar
  29. 29.
    Dai, K., Harriet, J.F., MacAuslan, J.: Recognizing emotion in speech using neural networks. In: Telehealth and Assistive Technologies, pp. 31–38 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Welten Institute, Research Centre for Learning, Teaching and Technology, Faculty of Psychology and Educational SciencesOpen University of the NetherlandsHeerlenThe Netherlands

Personalised recommendations