Spatio-temporal Event Classification Using Time-Series Kernel Based Structured Sparsity

  • László A. Jeni
  • András Lőrincz
  • Zoltán Szabó
  • Jeffrey F. Cohn
  • Takeo Kanade
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8692)


In many behavioral domains, such as facial expression and gesture, sparse structure is prevalent. This sparsity would be well suited for event detection but for one problem. Features typically are confounded by alignment error in space and time. As a consequence, high-dimensional representations such as SIFT and Gabor features have been favored despite their much greater computational cost and potential loss of information. We propose a Kernel Structured Sparsity (KSS) method that can handle both the temporal alignment problem and the structured sparse reconstruction within a common framework, and it can rely on simple features. We characterize spatio-temporal events as time-series of motion patterns and by utilizing time-series kernels we apply standard structured-sparse coding techniques to tackle this important problem. We evaluated the KSS method using both gesture and facial expression datasets that include spontaneous behavior and differ in degree of difficulty and type of ground truth coding. KSS outperformed both sparse and non-sparse methods that utilize complex image features and their temporal extensions. In the case of early facial event classification KSS had 10% higher accuracy as measured by F 1 score over kernel SVM methods.


structured sparsity time-series kernels facial expression classification gesture recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

978-3-319-10593-2_10_MOESM1_ESM.pdf (257 kb)
Electronic Supplementary Material(257 KB)
978-3-319-10593-2_10_MOESM2_ESM.mp4 (15.4 mb)
Electronic Supplementary Material(15,780 KB)


  1. 1.
    Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 4, 1–106 (2012)CrossRefGoogle Scholar
  2. 2.
    Baraniuk, R.G., Cevher, V., Duarte, M.F., Hegde, C.: Model-based compressive sensing. IEEE Transactions on Information Theory 56, 1982–2001 (2010)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2, 183–202 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Bousmalis, K., Morency, L.P., Pantic, M.: Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition. In: Automatic Face and Gesture Recognition, pp. 746–752 (2011)Google Scholar
  5. 5.
    Bousmalis, K., Zafeiriou, S., Morency, L.P., Pantic, M.: Infinite hidden conditional random fields for human behavior analysis. IEEE Transactions on Neural Networks and Learning Systems 24(1), 170–177 (2013)CrossRefGoogle Scholar
  6. 6.
    Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.S.: The convex geometry of linear inverse problems. Foundations of Computational Mathematics 12(6), 805–849 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011),
  8. 8.
    Chen, M.: Universal Motion-Based Control and Motion Recognition. Ph.D. thesis, Georgia Institute of Technology (2013)Google Scholar
  9. 9.
    Chen, M., AlRegib, G., Juang, B.H.: 6dmg: A new 6d motion gesture database. In: 3rd Multimedia Systems Conference, MMSys 2012, pp. 83–88. ACM, New York (2012)Google Scholar
  10. 10.
    Cuturi, M., Vert, J.P., Birkenes, Ø., Matsui, T.: A kernel for time series based on global alignments. In: International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 413–416 (2007)Google Scholar
  11. 11.
    Cuturi, M.: Fast global alignment kernels. In: International Conference on Machine Learning (ICML), pp. 929–936 (2011)Google Scholar
  12. 12.
    Cuturi, M.: Fast global alignment kernels. In: International Conference on Machine Learning, pp. 929–936 (2011)Google Scholar
  13. 13.
    Zhou, F., de la Torre, F., Hodgins, J.K.: Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(3), 582–596 (2013)CrossRefGoogle Scholar
  14. 14.
    Girard, J., Cohn, J.: Ground truth FACS action unit coding on the group formation task. Tech. rep., University of Pittsburgh (2013)Google Scholar
  15. 15.
    Huang, J., Zhang, T., Metaxas, D.: Learning with structured sparsity. Journal of Machine Learning Research 12, 3371–3412 (2011)zbMATHMathSciNetGoogle Scholar
  16. 16.
    Jeni, L., Cohn, J., de la Torre, F.: Facing imbalanced data–recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), pp. 245–251 (September 2013)Google Scholar
  17. 17.
    Jeni, L.A., Lőrincz, A., Nagy, T., Palotai, Z., Sebők, J., Szabó, Z., Takács, D.: 3d shape estimation in video sequences provides high precision evaluation of facial expressions. Image and Vision Computing 30(10), 785–795 (2012)CrossRefGoogle Scholar
  18. 18.
    Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: Automatic Face and Gesture Recognition, pp. 46–53 (2000)Google Scholar
  19. 19.
    Koltchinskii, V., Yuan, M.: Sparse recovery in large ensembles of kernel machines on-line learning and bandits. In: COLT, pp. 229–238 (2008)Google Scholar
  20. 20.
    Liu, J., Ji, S., Ye, J.: SLEP: Sparse learning with efficient projections (2010),
  21. 21.
    Long, F., Wu, T., Movellan, J.R., Bartlett, M.S., Littlewort, G.: Learning spatiotemporal features by using independent component analysis with application to facial expression recognition. Neurocomputing 93, 126–132 (2012)CrossRefGoogle Scholar
  22. 22.
    Lőrincz, A., Jeni, L.A., Szabó, Z., Cohn, J.F., Kanade, T.: Emotional expression classification using time-series kernels. In: Computer Vision and Pattern Recognition Workshops (CVPRW), Portland, OR (2013)Google Scholar
  23. 23.
    Lu, Y.M., Do, M.N.: A theory for sampling signals from union of subspaces. IEEE Transactions on Signal Processing 56(6), 2334–2345 (2008)CrossRefMathSciNetGoogle Scholar
  24. 24.
    Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101 (2010)Google Scholar
  25. 25.
    Mahoor, M., Zhou, M., Veon, K.L., Mavadati, S., Cohn, J.: Facial action unit recognition with sparse representation. In: Automatic Face Gesture Recognition and Workshops, pp. 336–342 (March 2011)Google Scholar
  26. 26.
    Matthews, I., Baker, S.: Active appearance models revisited. International Journal of Computer Vision 60(2), 135–164 (2004)CrossRefGoogle Scholar
  27. 27.
    Obozinski, G., Wainwright, J., Jordan, M., Support, M.I.: union recovery in high-dimensional multivariate regression. Annals of Statistics 39(1), 1–17 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  28. 28.
    Ekman, P., Friesen, W., Hager, J.: Facial action coding system: Research nexus. Network Research Information, Salt Lake City (2002)Google Scholar
  29. 29.
    Ekman, P., Friesen, W.F.: Facial action coding system: A technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto (1978)Google Scholar
  30. 30.
    Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(1), 43–49 (1978)CrossRefzbMATHGoogle Scholar
  31. 31.
    Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision 91(2), 200–215 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  32. 32.
    Sayette, M., Creswell, K., Dimoff, J., Fairbairn, C., Cohn, J., Heckman, B., Kirchner, T., Levine, J., Moreland, R.: Alcohol and group formation: a multimodal investigation of the effects of alcohol on emotion and social bonding. Psychological Science 23(8), 869–878 (2012)CrossRefGoogle Scholar
  33. 33.
    Sikka, K., Wu, T., Susskind, J., Bartlett, M.: Exploring bag of words architectures in the facial expression domain. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part II. LNCS, vol. 7584, pp. 250–259. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  34. 34.
    Steinwart, I., Christmann, A.: Support Vector Machines. Springer (2008)Google Scholar
  35. 35.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)Google Scholar
  36. 36.
    Tropp, J.A., Wright, S.J.: Computational methods for sparse solution of linear inverse problems. In: Proceedings of the IEEE Special Issue on Applications of Sparse Representation and Compressive Sensing, pp. 948–958 (2010)Google Scholar
  37. 37.
    Valstar, M.F., Pantic, M.: Combined support vector machines and hidden markov models for modeling facial action temporal dynamics. In: Lew, M., Sebe, N., Huang, T.S., Bakker, E.M. (eds.) HCI 2007. LNCS, vol. 4796, pp. 118–127. Springer, Heidelberg (2007)Google Scholar
  38. 38.
    Wu, T., Bartlett, M., Movellan, J.R.: Facial expression recognition using Gabor motion energy filters. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 42–47 (2010)Google Scholar
  39. 39.
    Xiong, X., de la Torre, F.: Supervised descent method and its applications to face alignment. In: Computer Vision and Pattern Recognition (CVPR), pp. 532–539 (June 2013)Google Scholar
  40. 40.
    Yang, P., Liu, Q., Metaxas, D.N.: Boosting encoded dynamic features for facial expression recognition. Pattern Recognition Letters 30(2), 132–139 (2009)CrossRefGoogle Scholar
  41. 41.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68(1), 49–67 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  42. 42.
    Zafeiriou, S., Petrou, M.: Sparse representations for facial expressions recognition via l1 optimization. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 32–39 (June 2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • László A. Jeni
    • 1
  • András Lőrincz
    • 2
  • Zoltán Szabó
    • 3
  • Jeffrey F. Cohn
    • 1
    • 4
  • Takeo Kanade
    • 1
  1. 1.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA
  2. 2.Faculty of InformaticsEötvös Loránd UniversityBudapestHungary
  3. 3.Gatsby Computational Neuroscience UnitUniversity College LondonLondonUK
  4. 4.Department of PsychologyUniversity of PittsburghPittsburghUSA

Personalised recommendations