Skip to main content

Unsupervised Temporal Segmentation of Talking Faces Using Visual Cues to Improve Emotion Recognition

  • Conference paper
Affective Computing and Intelligent Interaction (ACII 2011)

Abstract

The mouth region of human face possesses highly discriminative information regarding the expressions on the face. Facial expression analysis to infer the emotional state of a user becomes very challenging when the user talks, as most of the mouth actions while uttering certain words match with mouth shapes expressing various emotions. We introduce a novel unsupervised method to temporally segment talking faces from the faces displaying only emotions, and use the knowledge of talking face segments to improve emotion recognition. The proposed method uses integrated gradient histogram of local binary patterns to represent mouth features suitably and identifies temporal segments of talking faces online by estimating the uncertainties of mouth movements over a period of time. The algorithm accurately identifies talking face segments on a real-world database where talking and emotion happens naturally. Also, the emotion recognition system, using talking face cues, showed considerable improvement in recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ekman, P., Friesen, W.V.: Facial action coding system: A technique for measurement of facial movements. Consulting Psychologists (1978)

    Google Scholar 

  2. Velusamy, S., Kannan, H., Anand, B., Navathe, B., Sharma, A.: A Method to Infer Emotions From Facial Action Units. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (2011)

    Google Scholar 

  3. Bartlett, M.S., Littlewort, G., Frank, M., Lainscsek, C., Fasel, I., Movellan, J.: Recognizing Facial Expression: Machine Learning and Application to Spontaneous Behavior. In: IEEE Conf. on Computer Vision and Pat. Recog., pp. 568–573 (2005)

    Google Scholar 

  4. Lien, J.J., Zlochower, A., Cohn, J.F., Kanade, T.: Automated Facial Expression Recognition Based on FACS Action Units. In: Proceedings of IEEE Int. Conference on Automatic Face and Gesture Recognition, pp. 390–395 (1998)

    Google Scholar 

  5. Kaliouby, R.: Mind-reading machines: the automated inference of complex mental states from video, Ph.D. Thesis, University of Cambridge (2005)

    Google Scholar 

  6. Ahonen, T., Hadid, A., Pietikainen, M.: Face Description with Local Binary Patterns: Application to Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 2037–2041 (2006)

    Article  MATH  Google Scholar 

  7. Rudovic, O., Patras, I., Pantic, M.: Coupled Gaussian Process Regression for Pose-Invariant Facial Expression Recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 350–363. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Buciu, I., Kotsia, I., Pitas, I.: Facial expression analysis under partial occlusion.: In: IEEE Int. Conf. on Acoustics, Speech, Signal Proc. (ICASSP), pp. 453–456 (2005)

    Google Scholar 

  9. Zhao, G., Barnard, M., Pietikainen, M.: Lipreading With Local Spatiotemporal Descriptors. IEEE Trans. Multimedia 11(7), 1254–1265 (2009)

    Article  Google Scholar 

  10. Liu, P., Wang, Z.: Voice activity detection using visual information. In: Proc. of IEEE Int. Conf. on Acoustics, Speech, & Signal Proc. (ICASSP), pp. 609–612 (2004)

    Google Scholar 

  11. Bendris, M., Charlet, D., Chollet, G.: Lip activity detection for talking faces classification in TV-Content. In: International Conference on Machine Vision (2010)

    Google Scholar 

  12. Siatras, S., Nikolaidis, N., Krinidis, M., Pitas, I.: Visual Lip Activity Detection and Speaker Detection Using Mouth Region Intensities. IEEE Trans. Circuits and Systems for Video Technology 19(1), 133–137 (2009)

    Article  Google Scholar 

  13. Montse, P., Bonafonte, A., Landabaso, J.L.: Emotion Recognition Based on MPEG4 Facial Animation Parameters. In: Proceedings of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3624–3627 (2002)

    Google Scholar 

  14. Zhou, F., De la Tore, F., Jeffrey, F.C.: Unsupervised Discovery of Facial Events. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2010)

    Google Scholar 

  15. Saragih, J., Lucey, S., Cohn, J.: Deformable Model Fitting by Regularized Landmark Mean-Shifts. Interl. Journal of Computer Vision 91(2), 200–215 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  16. Ojala, T., Pietikainen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn., 51–59 (1996)

    Google Scholar 

  17. Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2004)

    MATH  Google Scholar 

  18. Hoque, M.E., Picard, R.W.: I See You (ICU): Towards Robust Recognition of Facial Expressions and Speech Prosody in Real Time. In: International Conference on Computer Vision and Pattern Recognition (CVPR), DEMO (2010)

    Google Scholar 

  19. Sohn, J., Sung, W.: A voice activity detector employing soft decision based noise spectrum adaptation. In: Proceedings of IEEE Int. Conference on Acoustics, Speech, Signal Processing (ICASSP), pp. 365–368 (1998)

    Google Scholar 

  20. Bourel, F., Chibelushi, C.C., Low, A.A.: Recognition of facial expressions in the presence of occlusion. In: Proc. of the Twelfth British Machine Vision Conference, vol. 1, pp. 213–222 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Velusamy, S., Gopalakrishnan, V., Navathe, B., Kannan, H., Anand, B., Sharma, A. (2011). Unsupervised Temporal Segmentation of Talking Faces Using Visual Cues to Improve Emotion Recognition. In: D’Mello, S., Graesser, A., Schuller, B., Martin, JC. (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, vol 6974. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24600-5_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24600-5_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24599-2

  • Online ISBN: 978-3-642-24600-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics