Recognizing Complex Events in Videos by Learning Key Static-Dynamic Evidences

Lai, Kuan-Ting; Liu, Dong; Chen, Ming-Syan; Chang, Shih-Fu

doi:10.1007/978-3-319-10578-9_44

Kuan-Ting Lai^19,20,
Dong Liu²¹,
Ming-Syan Chen^19,20 &
…
Shih-Fu Chang²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8691))

Included in the following conference series:

European Conference on Computer Vision

18k Accesses
19 Citations

Abstract

Complex events consist of various human interactions with different objects in diverse environments. The evidences needed to recognize events may occur in short time periods with variable lengths and can happen anywhere in a video. This fact prevents conventional machine learning algorithms from effectively recognizing the events. In this paper, we propose a novel method that can automatically identify the key evidences in videos for detecting complex events. Both static instances (objects) and dynamic instances (actions) are considered by sampling frames and temporal segments respectively. To compare the characteristic power of heterogeneous instances, we embed static and dynamic instances into a multiple instance learning framework via instance similarity measures, and cast the problem as an Evidence Selective Ranking (ESR) process. We impose ℓ₁ norm to select key evidences while using the Infinite Push Loss Function to enforce positive videos to have higher detection scores than negative videos. The Alternating Direction Method of Multipliers (ADMM) algorithm is used to solve the optimization problem. Experiments on large-scale video datasets show that our method can improve the detection accuracy while providing the unique capability in discovering key evidences of each complex event.

Download to read the full chapter text

Chapter PDF

Video Event Detection Using Kernel Support Vector Machine with Isotropic Gaussian Sample Uncertainty (KSVM-iGSU)

Resource Constrained Multimedia Event Detection

Dy-MIL: dynamic multiple-instance learning framework for video anomaly detection

Article 13 January 2024

Keywords

References

Agarwal, S.: The infinite push: A new support vector ranking algorithm that directly optimizes accuracy at the absolute top of the list. In: SDM, pp. 839–850. Society for Industrial and Applied Mathematics (2011)
Google Scholar
Bhattacharya, S., Yu, F.X., Chang, S.F.: Minimally needed evidence for complex event recognition in unconstrained videos. In: ICMR (2014)
Google Scholar
Cao, L., Mu, Y., Natsev, A., Chang, S.-F., Hua, G., Smith, J.R.: Scene aligned pooling for complex video recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 688–701. Springer, Heidelberg (2012)
Chapter Google Scholar
Chen, Y., Bi, J., Wang, J.Z.: Miles: Multiple-instance learning via embedded instance selection. PAMI 28(12), 1931–1947 (2006)
Article Google Scholar
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89(1), 31–71 (1997)
Article MATH Google Scholar
Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 494–507. Springer, Heidelberg (2010)
Chapter Google Scholar
INRIA: Yael library: Optimized implementations of computationally demanding functions (2009), https://gforge.inria.fr/projects/yael/
Jiang, Y.G., Bhattacharya, S., Chang, S.F., Shah, M.: High-level event recognition in unconstrained videos. IJMIR, 1–29 (2012)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: SIGKDD, pp. 133–142. ACM (2002)
Google Scholar
Li, W., Yu, Q., Divakaran, A., Vasconcelos, N.: Dynamic pooling for complex event recognition. In: ICCV (2013)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Article Google Scholar
Natarajan, P., Wu, S., Vitaladevuni, S., Zhuang, X., Tsakalidis, S., Park, U., Prasad, R.: Multimodal feature fusion for robust event detection in web videos. In: CVPR (2012)
Google Scholar
Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)
Chapter Google Scholar
Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: ICCV, pp. 1817–1824 (2013)
Google Scholar
Over, P., Awad, G., Michel, M., Fiscus, J., Sanders, G., Kraaij, W., Smeaton, A.F., Quenot, G.: Trecvid 2013 – an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2013. NIST (2013)
Google Scholar
Quattoni, A., Carreras, X., Collins, M., Darrell, T.: An efficient projection for l _1, ∞, infinity regularization. In: ICML (2009)
Google Scholar
Rakotomamonjy, A.: Sparse support vector infinite push. In: ICML (2012)
Google Scholar
Rudin, C.: The p-norm push: A simple convex ranking algorithm that concentrates at the top of the list. JMLR 10, 2233–2271 (2009)
MATH MathSciNet Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. CRCV-TR-12-01 (2012)
Google Scholar
Tamrakar, A., Ali, S., Yu, Q., Liu, J., Javed, O., Divakaran, A., Cheng, H., Sawhney, H.: Evaluation of low-level features and their combinations for complex event detection in open source videos. In: CVPR (2012)
Google Scholar
Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: CVPR (2012)
Google Scholar
Vahdat, A., Cannons, K., Mori, G., Oh, S., Kim, I.: Compositional models for video event detection: A multiple kernel learning latent variable approach. In: ICCV, pp. 1185–1192 (2013)
Google Scholar
Vedaldi, A., Fulkerson, B.: Vlfeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org/
Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV, 1–20 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate Institute of Electrical Engineering, National Taiwan University, Taiwan
Kuan-Ting Lai & Ming-Syan Chen
Research Center for IT Innovation, Academia Sinica, Taiwan
Kuan-Ting Lai & Ming-Syan Chen
Department of Electrical Engineering, Columbia University, USA
Dong Liu & Shih-Fu Chang

Authors

Kuan-Ting Lai
View author publications
You can also search for this author in PubMed Google Scholar
Dong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Syan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Fu Chang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toront, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lai, KT., Liu, D., Chen, MS., Chang, SF. (2014). Recognizing Complex Events in Videos by Learning Key Static-Dynamic Evidences. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8691. Springer, Cham. https://doi.org/10.1007/978-3-319-10578-9_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-10578-9_44
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10577-2
Online ISBN: 978-3-319-10578-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Recognizing Complex Events in Videos by Learning Key Static-Dynamic Evidences

Abstract

Chapter PDF

Similar content being viewed by others

Video Event Detection Using Kernel Support Vector Machine with Isotropic Gaussian Sample Uncertainty (KSVM-iGSU)

Resource Constrained Multimedia Event Detection

Dy-MIL: dynamic multiple-instance learning framework for video anomaly detection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Recognizing Complex Events in Videos by Learning Key Static-Dynamic Evidences

Abstract

Chapter PDF

Similar content being viewed by others

Video Event Detection Using Kernel Support Vector Machine with Isotropic Gaussian Sample Uncertainty (KSVM-iGSU)

Resource Constrained Multimedia Event Detection

Dy-MIL: dynamic multiple-instance learning framework for video anomaly detection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation