Skip to main content

Multimodal Information Processing for Affective Computing

  • Chapter
  • First Online:

Abstract

Affective computing is interaction that relates to, arises from or deliberately influences emotions [1]; it tries to assign computers the human-like capabilities of observation, interpretation and generation of affect features. It is an important topic in human–computer interaction (HCI), because it helps increase the quality of human to computer communications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Picard R W. (1997). Affective Computing. MIT Press, Cambridge, MA.

    Google Scholar 

  2. James W. (1884). What is emotion? Mind, vol. 9(34), 188–205.

    Google Scholar 

  3. Oatley K. (1987). Cognitive science and the understanding of emotions. Cogn. Emotion, 3(1), 209–216.

    Google Scholar 

  4. Bigun E. S., Bigun J., Duc B., Fischer S. (1997). Expert conciliation for multimodal person authentication systems using bayesian statistics. In: Int. Conf. on Audio and Video-Based Biometric Person Authentication (AVBPA), Crans-Montana, Switzerland, 291–300.

    Google Scholar 

  5. Scherer K. R. (1986). Vocal affect expression: A review and a model for future research. Psychol. Bull., vol. 99(2), 143–165.

    Article  Google Scholar 

  6. Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Commun., 40, 227–256.

    Article  Google Scholar 

  7. Scherer, K. R., Banse, R., Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. J. Cross-Cultural Psychol., 32 (1), 76–92.

    Article  Google Scholar 

  8. Johnstone, T., van Reekum, C. M., Scherer, K. R. (2001). Vocal correlates of appraisal processes. In: Scherer, K. R., Schorr, A., Johnstone, T. (eds) Appraisal Processes in Emotion: Theory, Methods, Research. Oxford University Press, New York and Oxford, 271–284.

    Google Scholar 

  9. Petrushin, V. A. (2000). Emotion recognition in speech signal: Experimental study, development and application. In: 6th Int. Conf. on Spoken Language Processing, ICSLP2000, Beijing, 222–225.

    Google Scholar 

  10. Gobl, C., Chasaide, A. N. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Commun., 40(1-2), 189–212.

    Article  Google Scholar 

  11. Tato, R., Santos, R., Kompe, R., Pardo, J. M. (2002). Emotional space improves emotion recognition. In: ICSLP2002, Denver, CO, 2029–2032.

    Google Scholar 

  12. Dellaert, F., Polzin, T., Waibel, A. (1996). Recognizing emotion in speech. In: ICSLP 1996, Philadelphia, PA, 1970–1973.

    Google Scholar 

  13. Lee, C. M., Narayanan, S., Pieraccini, R. (2001). Recognition of negative emotion in the human speech signals. In: Workshop on Automatic Speech Recognition and Understanding.

    Google Scholar 

  14. Yu, F., Chang, E., Xu, Y. Q., Shum H. Y. (2001). Emotion detection from speech to enrich multimedia content. In: The 2nd IEEE Pacific-Rim Conf. on Multimedia, Beijing, China, 550–557.

    Google Scholar 

  15. Campbell, N. (2004). Perception of affect in speech – towards an automatic processing of paralinguistic information in spoken conversation. In: ICSLP2004, Jeju, 881–884.

    Google Scholar 

  16. Cahn, J. E. (1990). The generation of affect in synthesized speech. J. Am. Voice I/O Soc., vol. 8, 1–19.

    Google Scholar 

  17. Schroder, M. (2001). Emotional speech synthesis: A review. In: Eurospeech 2001, Aalborg, Denmark, 561–564.

    Google Scholar 

  18. Campbell, N. (2004). Synthesis units for conversational speech – using phrasal segments. Autumn Meet. Acoust.: Soc. Jpn., vol. 2005, 337–338.

    Google Scholar 

  19. Schroder, M., Breuer, S. (2004). XML representation languages as a way of interconnecting TTS modules. In: 8th Int. Conf. on Spoken Language Processing, ICSLP’04, Jeju, Korea.

    Google Scholar 

  20. Eide, E., Aaron, A., Bakis, R., Hamza, W., Picheny, M., Pitrelli, J. (2002). A corpus-based approach to <ahem/> expressive speech synthesis. In: IEEE Speech Synthesis Workshop, Santa Monica, 79–84.

    Google Scholar 

  21. Chuang, Z. J., Wu, C. H. (2002). Emotion recognition from textual input using an emotional semantic network. In: Int. Conf. on Spoken Language Processing, ICSLP 2002, Denver, 177–180.

    Google Scholar 

  22. Tao, J. (2003). Emotion control of chinese speech synthesis in natural environment. In: Eurospeech2003, Geneva.

    Google Scholar 

  23. Moriyama, T., Ozawa, S. (1999). Emotion recognition and synthesis system on speech. In: IEEE Int. Conf. on Multimedia Computing and Systems, Florence, Italy, 840–844.

    Google Scholar 

  24. Massaro, D. W., Beskow, J., Cohen, M. M., Fry, C. L., Rodriguez, T. (1999). Picture my voice: Audio to visual speech synthesis using artificial neural networks. In: AVSP’99, Santa Cruz, CA, 133–138.

    Google Scholar 

  25. Darwin, C. (1872). The Expression of the Emotions in Man and Animals. University of Chicago Press, Chicago.

    Book  Google Scholar 

  26. Etcoff, N. L., Magee, J. J. (1992). Categorical perception of facial expressions. Cognition, vol. 44, 227–240.

    Google Scholar 

  27. Ekman, P., Friesen, W. V. (1997). Manual for the Facial Action Coding System. Consulting Psychologists Press, Palo Alto, CA.

    Google Scholar 

  28. Yamamoto, E., Nakamura, S., Shikano, K. (1998). Lip movement synthesis from speech based on Hidden Markov Models. Speech Commun., vol. 26, 105–115.

    Article  Google Scholar 

  29. Tekalp, A. M., Ostermann, J. (2000). Face and 2-D mesh animation in MPEG-4. Signal Process.: Image Commun., vol. 15, 387–421.

    Google Scholar 

  30. Lyons, M. J., Akamatsu, S., Kamachi, M., Gyoba, J. (1998). Coding facial expressions with gabor wavelets. In: 3rd IEEE Int. Conf. on Automatic Face and Gesture Recognition, Nara, Japan, 200–205.

    Google Scholar 

  31. Calder, A. J., Burton, A. M., Miller, P., Young, A. W., Akamatsu, S. (2001). A principal component analysis of facial expression. Vis. Res., vol. 41, 1179–208.

    Article  Google Scholar 

  32. Kobayashi, H., Hara, F. (1992). Recognition of six basic facial expressions and their strength by neural network. In: Intl. Workshop on Robotics and Human Communications, New York, 381–386.

    Google Scholar 

  33. Bregler, C., Covell, M., Slaney, M. (1997). Video rewrite: Driving visual speech with audio. In: ACM SIGGRAPH’97, Los Angeles, CA, 353–360.

    Google Scholar 

  34. Cosatto, E., Potamianos, G., Graf, H. P. (2000). Audio-visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE Int. Conf. on Multimedia and Expo, New York, 619–622.

    Google Scholar 

  35. Ezzat, T., Poggio, T. (1998). MikeTalk: A talking facial display based on morphing visemes. In: Computer Animation Conf., Philadelphia, PA, 456–459.

    Google Scholar 

  36. Gutierrez-Osuna, R., Rundomin, J. L. (2005). Speech-driven facial animation with realistic dynamics. IEEE Trans. Multimedia, vol. 7, 33–42.

    Article  Google Scholar 

  37. Hong, P. Y., Wen, Z., Huang, T. S. (2002). Real-time speech-driven face animation with expressions using neural networks. IEEE Trans. Neural Netw., vol. 13, 916–927.

    Article  Google Scholar 

  38. Verma, A., Subramaniam, L. V., Rajput, N., Neti, C., Faruquie, T. A. (2004). Animating expressive faces across languages. IEEE Trans Multimedia, vol. 6, 791–800.

    Article  Google Scholar 

  39. Collier, G. (1985). Emotional expression, Lawrence Erlbaum Associates. http://faculty.uccb.ns.ca/~gcollier/

  40. Argyle, M. (1988). Bodily Communication. Methuen & Co, New York, NY.

    Google Scholar 

  41. Siegman, A. W., Feldstein, S. (1985). Multichannel Integrations of Nonverbal Behavior, Lawrence Erlbaum Associates, Hillsdale, NJ.

    Google Scholar 

  42. Feldman, R. S., Philippot, P., Custrini, R. J. (1991). Social competence and nonverbal behavior. In: Rimé, R. S. F. B. (ed) Fundamentals of Nonverbal Behavior. Cambridge University Press, Cambridge, 329–350.

    Google Scholar 

  43. Knapp, M. L., Hall, J. A. (2006). Nonverbal Communication in Human Interaction, 6th edn. Thomson Wadsworth, Belmont, CA.

    Google Scholar 

  44. Go, H. J., Kwak, K. C., Lee, D. J., Chun, M. G. (2003). Emotion recognition from facial image and speech signal. In: Int. Conf. Society of Instrument and Control Engineers, Fukui, Japan, 2890–2895.

    Google Scholar 

  45. Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M. et al. (2004), Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Int. Conf. on Multimodal Interfaces, State College, PA, 205–211.

    Google Scholar 

  46. Song, M., Bu, J., Chen, C., Li, N. (2004). Audio-visual based emotion recognition – A new approach. In: Int. Conf. on Computer Vision and Pattern Recognition, Washington, DC, USA, 1020–1025.

    Google Scholar 

  47. Zeng, Z., Tu, J., Liu, M., Zhang, T., Rizzolo, N., Zhang, Z., Huang, T. S., Roth, D., Levinson, S. (2004). Bimodal HCI-related emotion recognition. In: Int. Conf. on Multimodal Interfaces, State College, PA, 137–143.

    Google Scholar 

  48. Zeng, Z., Tu, J., Pianfetti, B., Huang, T. S. Audio-visual affective expression recognition through multi-stream fused HMM. IEEE Trans. Multimedia, vol. 10(4), 570–577.

    Google Scholar 

  49. Zeng, Z., Tu, J., Liu, M., Huang, T. S., Pianfetti, B., Roth D., Levinson, S. (2007). Audio-visual affect recognition. IEEE Trans. Multimedia, 9 (2), 424–428.

    Article  Google Scholar 

  50. Wang, Y., Guan, L. (2005). Recognizing human emotion from audiovisual information. In: ICASSP, Philadelphia, PA, Vol. II, 1125–1128.

    Google Scholar 

  51. Hoch, S., Althoff, F., McGlaun, G., Rigoll, G. (2005). Bimodal fusion of emotional data in an automotive environment. In: ICASSP, Philadelphia, PA, Vol. II, 1085–1088.

    Google Scholar 

  52. Fragopanagos, F., Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Netw., 18, 389–405.

    Article  Google Scholar 

  53. Pal, P., Iyer, A. N., Yantorno, R. E. (2006). Emotion detection from infant facial expressions and cries. In: Proc. Int’l Conf. on Acoustics, Speech & Signal Processing, Philadelphia, PA, 2, 721–724.

    Google Scholar 

  54. Caridakis, G., Malatesta, L., Kessous, L., Amir, N., Paouzaiou, A., Karpouzis, K. (2006). Modeling naturalistic affective states via facial and vocal expression recognition. In: Int. Conf. on Multimodal Interfaces, Banff, Alberta, Canada, 146–154.

    Google Scholar 

  55. Karpouzis, K., Caridakis, G., Kessous, L., Amir, N., Raouzaiou, A., Malatesta, L., Kollias, S. (2007). Modeling naturalistic affective states via facial, vocal, and bodily expression recognition. In: Lecture Notes in Artificial Intelligence, vol. 4451, 91–112.

    Google Scholar 

  56. Chen, C. Y., Huang, Y. K., Cook, P. (2005). Visual/Acoustic emotion recognition. In: Proc. Int. Conf. on Multimedia and Expo, Amsterdam, Netherlands, 1468–1471.

    Google Scholar 

  57. Picard, R. W. (2003). Affective computing: Challenges. Int. J. Hum. Comput. Studies, vol. 59, 55–64.

    Google Scholar 

  58. Ortony, A., Clore, G. L., Collins, A. (1990). The Cognitive Structure of Emotions. Cambridge University Press, Cambridge.

    Google Scholar 

  59. Carberry, S., de Rosis, F. (2008). Introduction to the Special Issue of UMUAI on ‘Affective Modeling and Adaptation’, International Journal of User Modeling and User-Adapted Interaction, vol. 18, 1–9.

    Article  Google Scholar 

  60. Esposito, A., Balodis, G., Ferreira, A., Cristea, G. (2006). Cross-Modal Analysis of Verbal and Non-verbal Communication. Proposal for a COST Action.

    Google Scholar 

  61. Yin, P. R., Tao J. H. (2005). Dynamic mapping method based speech driven face animation system. In: The 1st Int. Conf. on Affective Computing and Intelligent Interaction (ACII2005), Beijing., 755–763.

    Google Scholar 

  62. O’Brien, J. F., Bodenheimer, B., Brostow, G., Hodgins, J. (2000). Automatic joint parameter estimation from magnetic motion capture data. In: Graphics Interface 2000, Montreal, Canada, 53–60.

    Google Scholar 

  63. Aggarwal, J. K., Cai, Q. (1999). Human motion analysis: A review. Comput. Vision Image Understand., vol. 73(3), 428–440.

    Google Scholar 

  64. Gavrila, D. M. (1999). The visual analysis of human movement: A survey. Comput. Vision Image Understand., vol. 73(1), 82–98.

    Google Scholar 

  65. Azarbayejani, A., Wren, C., Pentland, A. (1996). Real-time 3-D tracking of the human body. In: IMAGE’COM 96, Bordeaux, France.

    Google Scholar 

  66. Camurri, A., Poli, G. D., Leman, M., Volpe, G. (2001). A multi-layered conceptual framework for expressive gesture applications. In: Intl. EU-TMR MOSART Workshop, Barcelona.

    Google Scholar 

  67. Cowie, R. (2001). Emotion recognition in human-computer interaction. IEEE Signal Process. Mag., vol. 18(1), 32–80.

    Article  Google Scholar 

  68. Brunelli, R., Falavigna, D. (1995). Person identification using multiple cues. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 17(10), 955–966.

    Google Scholar 

  69. Kumar, A., Wong, D. C., Shen, H. C., Jain, A. K. (2003). Personal verification using palmprint and hand geometry biometric. In: 4th Int. Conf. on Audio- and Video-based Biometric Person Authentication, Guildford, UK, 668–678.

    Google Scholar 

  70. Frischholz, R. W., Dieckmann, U. (2000). Bioid: A multimodal biometric identification system. IEEE Comput., vol. 33(2), 64–68.

    Article  Google Scholar 

  71. Jain, A. K., Ross, A. (2002). Learning user-specific parameters in a multibiometric system. In: Int. Conf. on Image Processing (ICIP), Rochester, New York, 57–60.

    Google Scholar 

  72. Ho, T. K., Hull, J. J., Srihari, S. N. (1994). Decision combination in multiple classifier systems. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16(1), 66–75.

    Google Scholar 

  73. Kittler, J., Hatef, M., Duin, R. P. W., Matas, J. (1998). On combining classifiers. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20(3), 226–239.

    Google Scholar 

  74. Dieckmann, U., Plankensteiner, P., Wagner T. (1997). Sesam: A biometric person identification system using sensor fusion. Pattern Recognit. Lett., vol. 18, 827–833.

    Article  Google Scholar 

  75. Silva, D., Miyasato, T., Nakatsu, R. (1997). Facial emotion recognition using multi-modal information, In: Proc. Int. Conf. on Information and Communications and Signal Processing, Singapore, 397–401.

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 60575032 and the 863 program under Grant 2006AA01Z138.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianhua Tao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Tao, J. (2010). Multimodal Information Processing for Affective Computing. In: Chen, F., Jokinen, K. (eds) Speech Technology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-73819-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-73819-2_9

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-73818-5

  • Online ISBN: 978-0-387-73819-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics