Spatio-temporal Event Classification Using Time-Series Kernel Based Structured Sparsity

Jeni, László A.; Lőrincz, András; Szabó, Zoltán; Cohn, Jeffrey F.; Kanade, Takeo

doi:10.1007/978-3-319-10593-2_10

László A. Jeni¹⁹,
András Lőrincz²⁰,
Zoltán Szabó²¹,
Jeffrey F. Cohn^19,22 &
…
Takeo Kanade¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8692))

Included in the following conference series:

European Conference on Computer Vision

24k Accesses
16 Citations

Abstract

In many behavioral domains, such as facial expression and gesture, sparse structure is prevalent. This sparsity would be well suited for event detection but for one problem. Features typically are confounded by alignment error in space and time. As a consequence, high-dimensional representations such as SIFT and Gabor features have been favored despite their much greater computational cost and potential loss of information. We propose a Kernel Structured Sparsity (KSS) method that can handle both the temporal alignment problem and the structured sparse reconstruction within a common framework, and it can rely on simple features. We characterize spatio-temporal events as time-series of motion patterns and by utilizing time-series kernels we apply standard structured-sparse coding techniques to tackle this important problem. We evaluated the KSS method using both gesture and facial expression datasets that include spontaneous behavior and differ in degree of difficulty and type of ground truth coding. KSS outperformed both sparse and non-sparse methods that utilize complex image features and their temporal extensions. In the case of early facial event classification KSS had 10% higher accuracy as measured by F ₁ score over kernel SVM methods.

Electronic supplementary material -Supplementary material is available in the online version of this chapter at http://dx.doi.org/10.1007/978-3-319-10593-2_10 . Videos can also be accessed at http://www.springerimages.com/videos/978-3-319-10592-5

Download to read the full chapter text

Chapter PDF

Human Activity Recognition Using Hierarchically-Mined Feature Constellations

Non-negative Kernel Sparse Coding for the Analysis of Motion Data

Micro-Facial Movements: An Investigation on Spatio-Temporal Descriptors

Keywords

References

Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 4, 1–106 (2012)
Article Google Scholar
Baraniuk, R.G., Cevher, V., Duarte, M.F., Hegde, C.: Model-based compressive sensing. IEEE Transactions on Information Theory 56, 1982–2001 (2010)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2, 183–202 (2009)
Article MATH MathSciNet Google Scholar
Bousmalis, K., Morency, L.P., Pantic, M.: Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition. In: Automatic Face and Gesture Recognition, pp. 746–752 (2011)
Google Scholar
Bousmalis, K., Zafeiriou, S., Morency, L.P., Pantic, M.: Infinite hidden conditional random fields for human behavior analysis. IEEE Transactions on Neural Networks and Learning Systems 24(1), 170–177 (2013)
Article Google Scholar
Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.S.: The convex geometry of linear inverse problems. Foundations of Computational Mathematics 12(6), 805–849 (2012)
Article MATH MathSciNet Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen, M.: Universal Motion-Based Control and Motion Recognition. Ph.D. thesis, Georgia Institute of Technology (2013)
Google Scholar
Chen, M., AlRegib, G., Juang, B.H.: 6dmg: A new 6d motion gesture database. In: 3rd Multimedia Systems Conference, MMSys 2012, pp. 83–88. ACM, New York (2012)
Google Scholar
Cuturi, M., Vert, J.P., Birkenes, Ø., Matsui, T.: A kernel for time series based on global alignments. In: International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 413–416 (2007)
Google Scholar
Cuturi, M.: Fast global alignment kernels. In: International Conference on Machine Learning (ICML), pp. 929–936 (2011)
Google Scholar
Cuturi, M.: Fast global alignment kernels. In: International Conference on Machine Learning, pp. 929–936 (2011)
Google Scholar
Zhou, F., de la Torre, F., Hodgins, J.K.: Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(3), 582–596 (2013)
Article Google Scholar
Girard, J., Cohn, J.: Ground truth FACS action unit coding on the group formation task. Tech. rep., University of Pittsburgh (2013)
Google Scholar
Huang, J., Zhang, T., Metaxas, D.: Learning with structured sparsity. Journal of Machine Learning Research 12, 3371–3412 (2011)
MATH MathSciNet Google Scholar
Jeni, L., Cohn, J., de la Torre, F.: Facing imbalanced data–recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), pp. 245–251 (September 2013)
Google Scholar
Jeni, L.A., Lőrincz, A., Nagy, T., Palotai, Z., Sebők, J., Szabó, Z., Takács, D.: 3d shape estimation in video sequences provides high precision evaluation of facial expressions. Image and Vision Computing 30(10), 785–795 (2012)
Article Google Scholar
Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: Automatic Face and Gesture Recognition, pp. 46–53 (2000)
Google Scholar
Koltchinskii, V., Yuan, M.: Sparse recovery in large ensembles of kernel machines on-line learning and bandits. In: COLT, pp. 229–238 (2008)
Google Scholar
Liu, J., Ji, S., Ye, J.: SLEP: Sparse learning with efficient projections (2010), http://www.public.asu.edu/~jye02/Software/SLEP/
Long, F., Wu, T., Movellan, J.R., Bartlett, M.S., Littlewort, G.: Learning spatiotemporal features by using independent component analysis with application to facial expression recognition. Neurocomputing 93, 126–132 (2012)
Article Google Scholar
Lőrincz, A., Jeni, L.A., Szabó, Z., Cohn, J.F., Kanade, T.: Emotional expression classification using time-series kernels. In: Computer Vision and Pattern Recognition Workshops (CVPRW), Portland, OR (2013)
Google Scholar
Lu, Y.M., Do, M.N.: A theory for sampling signals from union of subspaces. IEEE Transactions on Signal Processing 56(6), 2334–2345 (2008)
Article MathSciNet Google Scholar
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101 (2010)
Google Scholar
Mahoor, M., Zhou, M., Veon, K.L., Mavadati, S., Cohn, J.: Facial action unit recognition with sparse representation. In: Automatic Face Gesture Recognition and Workshops, pp. 336–342 (March 2011)
Google Scholar
Matthews, I., Baker, S.: Active appearance models revisited. International Journal of Computer Vision 60(2), 135–164 (2004)
Article Google Scholar
Obozinski, G., Wainwright, J., Jordan, M., Support, M.I.: union recovery in high-dimensional multivariate regression. Annals of Statistics 39(1), 1–17 (2011)
Article MATH MathSciNet Google Scholar
Ekman, P., Friesen, W., Hager, J.: Facial action coding system: Research nexus. Network Research Information, Salt Lake City (2002)
Google Scholar
Ekman, P., Friesen, W.F.: Facial action coding system: A technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto (1978)
Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(1), 43–49 (1978)
Article MATH Google Scholar
Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision 91(2), 200–215 (2011)
Article MATH MathSciNet Google Scholar
Sayette, M., Creswell, K., Dimoff, J., Fairbairn, C., Cohn, J., Heckman, B., Kirchner, T., Levine, J., Moreland, R.: Alcohol and group formation: a multimodal investigation of the effects of alcohol on emotion and social bonding. Psychological Science 23(8), 869–878 (2012)
Article Google Scholar
Sikka, K., Wu, T., Susskind, J., Bartlett, M.: Exploring bag of words architectures in the facial expression domain. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part II. LNCS, vol. 7584, pp. 250–259. Springer, Heidelberg (2012)
Chapter Google Scholar
Steinwart, I., Christmann, A.: Support Vector Machines. Springer (2008)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)
Google Scholar
Tropp, J.A., Wright, S.J.: Computational methods for sparse solution of linear inverse problems. In: Proceedings of the IEEE Special Issue on Applications of Sparse Representation and Compressive Sensing, pp. 948–958 (2010)
Google Scholar
Valstar, M.F., Pantic, M.: Combined support vector machines and hidden markov models for modeling facial action temporal dynamics. In: Lew, M., Sebe, N., Huang, T.S., Bakker, E.M. (eds.) HCI 2007. LNCS, vol. 4796, pp. 118–127. Springer, Heidelberg (2007)
Google Scholar
Wu, T., Bartlett, M., Movellan, J.R.: Facial expression recognition using Gabor motion energy filters. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 42–47 (2010)
Google Scholar
Xiong, X., de la Torre, F.: Supervised descent method and its applications to face alignment. In: Computer Vision and Pattern Recognition (CVPR), pp. 532–539 (June 2013)
Google Scholar
Yang, P., Liu, Q., Metaxas, D.N.: Boosting encoded dynamic features for facial expression recognition. Pattern Recognition Letters 30(2), 132–139 (2009)
Article Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68(1), 49–67 (2006)
Article MATH MathSciNet Google Scholar
Zafeiriou, S., Petrou, M.: Sparse representations for facial expressions recognition via l1 optimization. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 32–39 (June 2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
László A. Jeni, Jeffrey F. Cohn & Takeo Kanade
Faculty of Informatics, Eötvös Loránd University, Budapest, Hungary
András Lőrincz
Gatsby Computational Neuroscience Unit, University College London, London, UK
Zoltán Szabó
Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
Jeffrey F. Cohn

Authors

László A. Jeni
View author publications
You can also search for this author in PubMed Google Scholar
András Lőrincz
View author publications
You can also search for this author in PubMed Google Scholar
Zoltán Szabó
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey F. Cohn
View author publications
You can also search for this author in PubMed Google Scholar
Takeo Kanade
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
KU Leuven, ESAT - PSI, iMinds, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

1 Electronic Supplementary Material

Electronic Supplementary Material(257 KB)

Electronic Supplementary Material(15,780 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jeni, L.A., Lőrincz, A., Szabó, Z., Cohn, J.F., Kanade, T. (2014). Spatio-temporal Event Classification Using Time-Series Kernel Based Structured Sparsity. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8692. Springer, Cham. https://doi.org/10.1007/978-3-319-10593-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-10593-2_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10592-5
Online ISBN: 978-3-319-10593-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Spatio-temporal Event Classification Using Time-Series Kernel Based Structured Sparsity

Abstract

Chapter PDF

Similar content being viewed by others

Human Activity Recognition Using Hierarchically-Mined Feature Constellations

Non-negative Kernel Sparse Coding for the Analysis of Motion Data

Micro-Facial Movements: An Investigation on Spatio-Temporal Descriptors

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material(257 KB)

Electronic Supplementary Material(15,780 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Spatio-temporal Event Classification Using Time-Series Kernel Based Structured Sparsity

Abstract

Chapter PDF

Similar content being viewed by others

Human Activity Recognition Using Hierarchically-Mined Feature Constellations

Non-negative Kernel Sparse Coding for the Analysis of Motion Data

Micro-Facial Movements: An Investigation on Spatio-Temporal Descriptors

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material(257 KB)

Electronic Supplementary Material(15,780 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation