Skip to main content

Decision-Level Fusion for Audio-Visual Laughter Detection

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5237))

Included in the following conference series:

Abstract

Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by fusing the results of separate audio and video classifiers on the decision level. This results in laughter detection with a significantly higher AUC-ROC than single-modality classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beal, M.J., Jojic, N., Attias, H.: A graphical model for audiovisual object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 25(7), 828–836 (2003)

    Article  Google Scholar 

  2. Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the International Conference on Multimodal Interfaces (ICME 2004), State College, PA, October 2004, pp. 205–211 (2004)

    Google Scholar 

  3. Campbell, N., Kashioka, H., Ohara, R.: No laughing matter. In: Proceedings of the Interspeech, Lisbon, Portugal, September 2005, pp. 465–468 (2005)

    Google Scholar 

  4. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  5. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models - their training and application. Computer Vision and Image Understanding (CVIU) 61(1), 38–59 (1995)

    Article  Google Scholar 

  6. Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Transactions on Multimedia 2(3), 141–151 (2000)

    Article  Google Scholar 

  7. Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  8. Go, H.-J., Kwak, K.-C., Lee, D.-J., Chun, M.-G.: Emotion recognition from the facial image and speech signal. In: Proceedings of the SICE Annual Conference, Fukui, Japan, August 2003, vol. 3, pp. 2890–2895 (2003)

    Google Scholar 

  9. Gunes, H., Piccardi, M.: Fusing face and body display for bi-modal emotion recognition: Single frame analysis and multi-frame post integration. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 102–111. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Hoch, S., Althoff, F., McGlaun, G., Rigoll, G.: Bimodal fusion of emotional data in an automotive environment. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, PA, vol. 2, pp. 1085–1088 (2005)

    Google Scholar 

  11. Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. Technical report, National Taiwan University, Taipei, Taiwan (July 2003)

    Google Scholar 

  12. Ito, A., Wang, X., Suzuki, M., Makino, S.: Smile and laughter recognition using speech processing and face recognition from conversation video. In: Proceedings of the International Conference on Cyberworlds (CW 2005), Singapore, November 2005, pp. 437–444 (2005)

    Google Scholar 

  13. Kennedy, L.S., Ellis, D.P.W.: Laughter detection in meetings. In: Proceedings of the NIST Meeting Recognition Workshop at the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), Montreal, Canada (May 2004)

    Google Scholar 

  14. Lockerd, A., Mueller, F.L.: Leveraging affective feedback camcorder. In: Extended abstracts of the Conference on Human Factors in Computing Systems (CHI 2002), Minneapolis, MN, April 2002, pp. 574–575 (2002)

    Google Scholar 

  15. Pal, P., Iyer, A.N., Yantorno, R.E.: Emotion detection from infant facial expressions and cries. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP 2006), Toulouse, France, May 2006, vol. 2, pp. 721–724 (2006)

    Google Scholar 

  16. Patras, I., Pantic, M.: Particle filtering with factorized likelihoods for tracking facial features. In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG 2004), Seoul, Korea, pp. 97–102 (2004)

    Google Scholar 

  17. Petridis, S., Pantic, M.: Audiovisual discrimination between laughter and speech. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), Las Vegas, NV, pp. 5117–5120 (2008)

    Google Scholar 

  18. Petridis, S., Pantic, M.: Fusion of audio and visual cues for laughter detection. In: Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR 2008), Niagara Falls, Canada (to appear, 2008)

    Google Scholar 

  19. Reuderink, B.: Fusion for audio-visual laughter detection. Technical report, University of Twente (2007)

    Google Scholar 

  20. Trouvain, J.: Segmenting phonetic units in laughter. In: Proceedings of the International Conference of the Phonetic Sciences, Barcelona, Spain, August 2003, pp. 2793–2796 (2003)

    Google Scholar 

  21. Truong, K.P., van Leeuwen, D.A.: Automatic discrimination between laughter and speech. Speech Communication 49(2), 144–158 (2007)

    Article  Google Scholar 

  22. Valstar, M.F., Pantic, M., Ambadar, Z., Cohn, J.F.: Spontaneous vs. posed facial behavior: automatic analysis of brow actions. In: Proceedings of the International Conference on Multimodal Interfaces (ICME 2006), Banff, Canada, November 2006, pp. 162–170 (2006)

    Google Scholar 

  23. Wang, Y., Guan, L.: Recognizing human emotion from audiovisual information. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, PA, vol. 2, pp. 1125–1128 (2005)

    Google Scholar 

  24. Xu, M., Chia, L.-T., Jin, J.S.: Affective content analysis in comedy and horror videos by audio emotional event detection. In: Proceedings of the International Conference on Multimodal Interfaces (ICME 2005), Amsterdam, The Netherlands, July 2005, pp. 622–625 (2005)

    Google Scholar 

  25. Zajdel, W., Krijnders, J., Andringa, T., Gavrila, D.: CASSANDRA: Audio-video sensor fusion for aggression detection. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS 2007), London, United Kingdom, September 2007, pp. 200–205 (2007)

    Google Scholar 

  26. Zeng, Z., Hu, Y., Roisman, G.I., Wen, Z., Fu, Y., Huang, T.S.: Audio-visual spontaneous emotion recognition. Artifical Intelligence for Human Computing, 72–90 (2007)

    Google Scholar 

  27. Zeng, Z., Tu, J., Liu, M., Huang, T.S., Pianfetti, B., Roth, D., Levinson, S.: Audio-visual affect recognition. IEEE Transactions on Multimedia 9(2), 424–428 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Andrei Popescu-Belis Rainer Stiefelhagen

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Reuderink, B., Poel, M., Truong, K., Poppe, R., Pantic, M. (2008). Decision-Level Fusion for Audio-Visual Laughter Detection. In: Popescu-Belis, A., Stiefelhagen, R. (eds) Machine Learning for Multimodal Interaction. MLMI 2008. Lecture Notes in Computer Science, vol 5237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85853-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85853-9_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85852-2

  • Online ISBN: 978-3-540-85853-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics