Decision-Level Fusion for Audio-Visual Laughter Detection

Reuderink, Boris; Poel, Mannes; Truong, Khiet; Poppe, Ronald; Pantic, Maja

doi:10.1007/978-3-540-85853-9_13

Boris Reuderink¹,
Mannes Poel¹,
Khiet Truong^1,2,
Ronald Poppe¹ &
…
Maja Pantic^1,3

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5237))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

924 Accesses
20 Citations

Abstract

Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by fusing the results of separate audio and video classifiers on the decision level. This results in laughter detection with a significantly higher AUC-ROC than single-modality classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beal, M.J., Jojic, N., Attias, H.: A graphical model for audiovisual object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 25(7), 828–836 (2003)
Article Google Scholar
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the International Conference on Multimodal Interfaces (ICME 2004), State College, PA, October 2004, pp. 205–211 (2004)
Google Scholar
Campbell, N., Kashioka, H., Ohara, R.: No laughing matter. In: Proceedings of the Interspeech, Lisbon, Portugal, September 2005, pp. 465–468 (2005)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models - their training and application. Computer Vision and Image Understanding (CVIU) 61(1), 38–59 (1995)
Article Google Scholar
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Transactions on Multimedia 2(3), 141–151 (2000)
Article Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Go, H.-J., Kwak, K.-C., Lee, D.-J., Chun, M.-G.: Emotion recognition from the facial image and speech signal. In: Proceedings of the SICE Annual Conference, Fukui, Japan, August 2003, vol. 3, pp. 2890–2895 (2003)
Google Scholar
Gunes, H., Piccardi, M.: Fusing face and body display for bi-modal emotion recognition: Single frame analysis and multi-frame post integration. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 102–111. Springer, Heidelberg (2005)
Chapter Google Scholar
Hoch, S., Althoff, F., McGlaun, G., Rigoll, G.: Bimodal fusion of emotional data in an automotive environment. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, PA, vol. 2, pp. 1085–1088 (2005)
Google Scholar
Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. Technical report, National Taiwan University, Taipei, Taiwan (July 2003)
Google Scholar
Ito, A., Wang, X., Suzuki, M., Makino, S.: Smile and laughter recognition using speech processing and face recognition from conversation video. In: Proceedings of the International Conference on Cyberworlds (CW 2005), Singapore, November 2005, pp. 437–444 (2005)
Google Scholar
Kennedy, L.S., Ellis, D.P.W.: Laughter detection in meetings. In: Proceedings of the NIST Meeting Recognition Workshop at the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), Montreal, Canada (May 2004)
Google Scholar
Lockerd, A., Mueller, F.L.: Leveraging affective feedback camcorder. In: Extended abstracts of the Conference on Human Factors in Computing Systems (CHI 2002), Minneapolis, MN, April 2002, pp. 574–575 (2002)
Google Scholar
Pal, P., Iyer, A.N., Yantorno, R.E.: Emotion detection from infant facial expressions and cries. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP 2006), Toulouse, France, May 2006, vol. 2, pp. 721–724 (2006)
Google Scholar
Patras, I., Pantic, M.: Particle filtering with factorized likelihoods for tracking facial features. In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG 2004), Seoul, Korea, pp. 97–102 (2004)
Google Scholar
Petridis, S., Pantic, M.: Audiovisual discrimination between laughter and speech. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), Las Vegas, NV, pp. 5117–5120 (2008)
Google Scholar
Petridis, S., Pantic, M.: Fusion of audio and visual cues for laughter detection. In: Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR 2008), Niagara Falls, Canada (to appear, 2008)
Google Scholar
Reuderink, B.: Fusion for audio-visual laughter detection. Technical report, University of Twente (2007)
Google Scholar
Trouvain, J.: Segmenting phonetic units in laughter. In: Proceedings of the International Conference of the Phonetic Sciences, Barcelona, Spain, August 2003, pp. 2793–2796 (2003)
Google Scholar
Truong, K.P., van Leeuwen, D.A.: Automatic discrimination between laughter and speech. Speech Communication 49(2), 144–158 (2007)
Article Google Scholar
Valstar, M.F., Pantic, M., Ambadar, Z., Cohn, J.F.: Spontaneous vs. posed facial behavior: automatic analysis of brow actions. In: Proceedings of the International Conference on Multimodal Interfaces (ICME 2006), Banff, Canada, November 2006, pp. 162–170 (2006)
Google Scholar
Wang, Y., Guan, L.: Recognizing human emotion from audiovisual information. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, PA, vol. 2, pp. 1125–1128 (2005)
Google Scholar
Xu, M., Chia, L.-T., Jin, J.S.: Affective content analysis in comedy and horror videos by audio emotional event detection. In: Proceedings of the International Conference on Multimodal Interfaces (ICME 2005), Amsterdam, The Netherlands, July 2005, pp. 622–625 (2005)
Google Scholar
Zajdel, W., Krijnders, J., Andringa, T., Gavrila, D.: CASSANDRA: Audio-video sensor fusion for aggression detection. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS 2007), London, United Kingdom, September 2007, pp. 200–205 (2007)
Google Scholar
Zeng, Z., Hu, Y., Roisman, G.I., Wen, Z., Fu, Y., Huang, T.S.: Audio-visual spontaneous emotion recognition. Artifical Intelligence for Human Computing, 72–90 (2007)
Google Scholar
Zeng, Z., Tu, J., Liu, M., Huang, T.S., Pianfetti, B., Roth, D., Levinson, S.: Audio-visual affect recognition. IEEE Transactions on Multimedia 9(2), 424–428 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Twente, P.O. Box 217, 7500 AE, Enschede, The Netherlands
Boris Reuderink, Mannes Poel, Khiet Truong, Ronald Poppe & Maja Pantic
TNO Defence, Sec. and Safety, , P.O. Box 23, 3769 ZG, Soesterberg, The Netherlands
Khiet Truong
Imperial College Dept. of Computing, , 180 Queen’s Gate, London, SW7 2AZ, UK
Maja Pantic

Authors

Boris Reuderink
View author publications
You can also search for this author in PubMed Google Scholar
Mannes Poel
View author publications
You can also search for this author in PubMed Google Scholar
Khiet Truong
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Poppe
View author publications
You can also search for this author in PubMed Google Scholar
Maja Pantic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Andrei Popescu-Belis Rainer Stiefelhagen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reuderink, B., Poel, M., Truong, K., Poppe, R., Pantic, M. (2008). Decision-Level Fusion for Audio-Visual Laughter Detection. In: Popescu-Belis, A., Stiefelhagen, R. (eds) Machine Learning for Multimodal Interaction. MLMI 2008. Lecture Notes in Computer Science, vol 5237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85853-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-85853-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85852-2
Online ISBN: 978-3-540-85853-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics