Abstract
In many behavioral domains, such as facial expression and gesture, sparse structure is prevalent. This sparsity would be well suited for event detection but for one problem. Features typically are confounded by alignment error in space and time. As a consequence, high-dimensional representations such as SIFT and Gabor features have been favored despite their much greater computational cost and potential loss of information. We propose a Kernel Structured Sparsity (KSS) method that can handle both the temporal alignment problem and the structured sparse reconstruction within a common framework, and it can rely on simple features. We characterize spatio-temporal events as time-series of motion patterns and by utilizing time-series kernels we apply standard structured-sparse coding techniques to tackle this important problem. We evaluated the KSS method using both gesture and facial expression datasets that include spontaneous behavior and differ in degree of difficulty and type of ground truth coding. KSS outperformed both sparse and non-sparse methods that utilize complex image features and their temporal extensions. In the case of early facial event classification KSS had 10% higher accuracy as measured by F 1 score over kernel SVM methods.
Electronic supplementary material -Supplementary material is available in the online version of this chapter at http://dx.doi.org/10.1007/978-3-319-10593-2_10 . Videos can also be accessed at http://www.springerimages.com/videos/978-3-319-10592-5
Chapter PDF
Similar content being viewed by others
References
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 4, 1–106 (2012)
Baraniuk, R.G., Cevher, V., Duarte, M.F., Hegde, C.: Model-based compressive sensing. IEEE Transactions on Information Theory 56, 1982–2001 (2010)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2, 183–202 (2009)
Bousmalis, K., Morency, L.P., Pantic, M.: Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition. In: Automatic Face and Gesture Recognition, pp. 746–752 (2011)
Bousmalis, K., Zafeiriou, S., Morency, L.P., Pantic, M.: Infinite hidden conditional random fields for human behavior analysis. IEEE Transactions on Neural Networks and Learning Systems 24(1), 170–177 (2013)
Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.S.: The convex geometry of linear inverse problems. Foundations of Computational Mathematics 12(6), 805–849 (2012)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen, M.: Universal Motion-Based Control and Motion Recognition. Ph.D. thesis, Georgia Institute of Technology (2013)
Chen, M., AlRegib, G., Juang, B.H.: 6dmg: A new 6d motion gesture database. In: 3rd Multimedia Systems Conference, MMSys 2012, pp. 83–88. ACM, New York (2012)
Cuturi, M., Vert, J.P., Birkenes, Ø., Matsui, T.: A kernel for time series based on global alignments. In: International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 413–416 (2007)
Cuturi, M.: Fast global alignment kernels. In: International Conference on Machine Learning (ICML), pp. 929–936 (2011)
Cuturi, M.: Fast global alignment kernels. In: International Conference on Machine Learning, pp. 929–936 (2011)
Zhou, F., de la Torre, F., Hodgins, J.K.: Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(3), 582–596 (2013)
Girard, J., Cohn, J.: Ground truth FACS action unit coding on the group formation task. Tech. rep., University of Pittsburgh (2013)
Huang, J., Zhang, T., Metaxas, D.: Learning with structured sparsity. Journal of Machine Learning Research 12, 3371–3412 (2011)
Jeni, L., Cohn, J., de la Torre, F.: Facing imbalanced data–recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), pp. 245–251 (September 2013)
Jeni, L.A., Lőrincz, A., Nagy, T., Palotai, Z., Sebők, J., Szabó, Z., Takács, D.: 3d shape estimation in video sequences provides high precision evaluation of facial expressions. Image and Vision Computing 30(10), 785–795 (2012)
Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: Automatic Face and Gesture Recognition, pp. 46–53 (2000)
Koltchinskii, V., Yuan, M.: Sparse recovery in large ensembles of kernel machines on-line learning and bandits. In: COLT, pp. 229–238 (2008)
Liu, J., Ji, S., Ye, J.: SLEP: Sparse learning with efficient projections (2010), http://www.public.asu.edu/~jye02/Software/SLEP/
Long, F., Wu, T., Movellan, J.R., Bartlett, M.S., Littlewort, G.: Learning spatiotemporal features by using independent component analysis with application to facial expression recognition. Neurocomputing 93, 126–132 (2012)
Lőrincz, A., Jeni, L.A., Szabó, Z., Cohn, J.F., Kanade, T.: Emotional expression classification using time-series kernels. In: Computer Vision and Pattern Recognition Workshops (CVPRW), Portland, OR (2013)
Lu, Y.M., Do, M.N.: A theory for sampling signals from union of subspaces. IEEE Transactions on Signal Processing 56(6), 2334–2345 (2008)
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101 (2010)
Mahoor, M., Zhou, M., Veon, K.L., Mavadati, S., Cohn, J.: Facial action unit recognition with sparse representation. In: Automatic Face Gesture Recognition and Workshops, pp. 336–342 (March 2011)
Matthews, I., Baker, S.: Active appearance models revisited. International Journal of Computer Vision 60(2), 135–164 (2004)
Obozinski, G., Wainwright, J., Jordan, M., Support, M.I.: union recovery in high-dimensional multivariate regression. Annals of Statistics 39(1), 1–17 (2011)
Ekman, P., Friesen, W., Hager, J.: Facial action coding system: Research nexus. Network Research Information, Salt Lake City (2002)
Ekman, P., Friesen, W.F.: Facial action coding system: A technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto (1978)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(1), 43–49 (1978)
Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision 91(2), 200–215 (2011)
Sayette, M., Creswell, K., Dimoff, J., Fairbairn, C., Cohn, J., Heckman, B., Kirchner, T., Levine, J., Moreland, R.: Alcohol and group formation: a multimodal investigation of the effects of alcohol on emotion and social bonding. Psychological Science 23(8), 869–878 (2012)
Sikka, K., Wu, T., Susskind, J., Bartlett, M.: Exploring bag of words architectures in the facial expression domain. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part II. LNCS, vol. 7584, pp. 250–259. Springer, Heidelberg (2012)
Steinwart, I., Christmann, A.: Support Vector Machines. Springer (2008)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)
Tropp, J.A., Wright, S.J.: Computational methods for sparse solution of linear inverse problems. In: Proceedings of the IEEE Special Issue on Applications of Sparse Representation and Compressive Sensing, pp. 948–958 (2010)
Valstar, M.F., Pantic, M.: Combined support vector machines and hidden markov models for modeling facial action temporal dynamics. In: Lew, M., Sebe, N., Huang, T.S., Bakker, E.M. (eds.) HCI 2007. LNCS, vol. 4796, pp. 118–127. Springer, Heidelberg (2007)
Wu, T., Bartlett, M., Movellan, J.R.: Facial expression recognition using Gabor motion energy filters. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 42–47 (2010)
Xiong, X., de la Torre, F.: Supervised descent method and its applications to face alignment. In: Computer Vision and Pattern Recognition (CVPR), pp. 532–539 (June 2013)
Yang, P., Liu, Q., Metaxas, D.N.: Boosting encoded dynamic features for facial expression recognition. Pattern Recognition Letters 30(2), 132–139 (2009)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68(1), 49–67 (2006)
Zafeiriou, S., Petrou, M.: Sparse representations for facial expressions recognition via l1 optimization. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 32–39 (June 2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jeni, L.A., Lőrincz, A., Szabó, Z., Cohn, J.F., Kanade, T. (2014). Spatio-temporal Event Classification Using Time-Series Kernel Based Structured Sparsity. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8692. Springer, Cham. https://doi.org/10.1007/978-3-319-10593-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-10593-2_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10592-5
Online ISBN: 978-3-319-10593-2
eBook Packages: Computer ScienceComputer Science (R0)