Skip to main content
Log in

An efficient multi-feature SVM solver for complex event detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Multimedia event detection (MED) has become one of the most important visual content analysis tools as the rapid growth of the user generated videos on the Internet. Generally, multimedia data is represented by multiple features and it is difficult to gain better performance for complex event detection with only single feature. However, how to fuse different features effectively is the crucial problem for MED with multiple features. Meanwhile, exploiting multiple features simultaneously in the large-scale scenarios always produces a heavy computational burden. To address these two issues, we propose a self-adaptive multi-feature learning framework with efficient Support Vector Machine (SVM) solver for complex event detection in this paper. Our model is able to utilize multiple features reasonably with an adaptively weighted linear combination manner, which is simple yet effective, according to the various impact that different features on a specific event. In order to mitigate the expensive computational cost, we employ a fast primal SVM solver in the proposed alternating optimization algorithm to obtain the approximate solution with gradient descent method. Extensive experiment results over standard datasets of TRECVID MEDTest 2013 and 2014 demonstrate the effectiveness and superiority of the proposed framework on complex event detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://www.youtube.com/yt/press/statistics.html.

  2. http://nist.gov/itl/iad/mig/med13.cfm.

  3. http://nist.gov/itl/iad/mig/med14.cfm.

References

  1. Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on image and video retrieval, pp 401–408

  2. Chang X, Yang Y (2016) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2016.2582746

    Article  MathSciNet  Google Scholar 

  3. Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: Proceedings of the 28th AAAI conference on artificial intelligence, pp 1171–1177

  4. Chang X, Yang Y, Xing EP, Yu YL (2015) Complex event detection using semantic saliency and nearly-isotonic svm. In: Proceedings of the 32nd international conference on machine learning, pp 1348–1357

  5. Chang X, Yang Y, Long G, Zhang C, Hauptmann AG (2016) Dynamic concept composition for zero-example event detection. In: Proceedings of the 30th AAAI conference on artificial intelligence, pp 3464–3470

  6. Chang X, Ma Z, Lin M, Yang Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920

    Article  MathSciNet  Google Scholar 

  7. Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann A G (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197

    Article  Google Scholar 

  8. Chang X, Yu Y L, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39 (8):1617–1632

    Article  Google Scholar 

  9. Chen MY, Hauptmann A (2009) Mosift: recognizing human actions in surveillance videos. Tech. rep. CMU-CS-09-161, Carnegie Mellon University

  10. Cortes C, Mohri M, Rostamizadeh A (2010) Two-stage learning kernel algorithms. In: Proceedings of the 27th international conference on machine learning, pp 239–246

  11. Coṡar S, Donatiello G, Bogorny V, Garate C, Alvares LO, Brémond F (2017) Toward abnormal trajectory and event detection in video surveillance. IEEE Trans Circ Syst Vid Technol 27(3):683–695

    Article  Google Scholar 

  12. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    Book  Google Scholar 

  13. Farquhar JD, Hardoon DR, Meng H, Shawe-Taylor J, Szedmak S (2005) Two view learning: Svm-2k, theory and practice. In: Proceedings of the 19th annual conference on neural information processing systems, pp 355–362

  14. Gill PE, Robinson DP (2012) A primal-dual augmented lagrangian. Comput Optim Appl 51(1):1–25

    Article  MathSciNet  Google Scholar 

  15. Gkalelis N, Mezaris V (2014) Video event detection using generalized subclass discriminant analysis and linear support vector machines. In: Proceedings of the 4th international conference on multimedia retrieval, p 25

  16. Gönen M, Alpaydın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268

    MathSciNet  MATH  Google Scholar 

  17. Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th international conference on machine learning, pp 408–415

  18. Izadinia H, Shah M (2012) Recognizing complex events using large margin joint low-level event model. In: Proceedings of the 10th European conference on computer vision, pp 430–444

    Chapter  Google Scholar 

  19. Jiang L, Hauptmann AG, Xiang G (2012) Leveraging high-level and low-level features for multimedia event detection. In: Proceedings of the 20th ACM international conference on multimedia, pp 449–458

  20. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the 27th IEEE conference on computer vision and pattern recognition, pp 1725– 1732

  21. Kludas J, Bruno E, Marchand-Maillet S (2007) Information fusion in multimedia information retrieval. In: Proceedings of the 5th international workshop on adaptive multimedia retrieval, pp 147–159

    Google Scholar 

  22. Lan ZZ, Jiang L, Yu SI, Rawat S, Cai Y, Gao C, Xu S, Shen H, Li X, Wang Y et al (2013) Cmu-informedia at trecvid 2013 multimedia event detection. In: Proceedings of NIST TRECVID 2013 Workshop, vol 1(2), p 5

  23. Lan ZZ, Bao L, Yu SI, Liu W, Hauptmann AG (2014) Multimedia classification and event detection using double fusion. Multimed Tools Appl 71 (1):333–347

    Article  Google Scholar 

  24. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 19th IEEE conference on computer vision and pattern recognition, vol 2, pp 2169–2178

  25. Lin CJ, Weng RC, Keerthi SS (2008) Trust region newton method for logistic regression. J Mach Learn Res 9:627–650

    MathSciNet  MATH  Google Scholar 

  26. Ma Z, Chang X, Yang Y, Sebe N, Hauptmann A (2017) The many shades of negativity. IEEE Trans Multimed 19(7):1558–1568

    Article  Google Scholar 

  27. Nie F, Huang Y, Wang X, Huang H (2014) New primal svm solver with linear computational cost for big data. In: Proceedings of the 31th international conference on machine learning, pp II-505

  28. Over P, Fiscus J, Sanders G, Joy D, Michel M, Awad G, Smeaton A, Kraaij W, Quénot G (2014) Trecvid 2014–an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of NIST TRECVID 2014 workshop, p 52

  29. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260

  30. Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for svm. In: Proceedings of the 24th international conference on machine learning, pp 807–814

  31. Snoek CG, Worring M, Smeulders AW (2005) Early versus late fusion in semantic video analysis. In: Proceedings of the 13th annual ACM international conference on multimedia, pp 399–402

  32. Song J, Yang Y, Huang Z, Shen HT, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM international conference on multimedia, pp 423–432

  33. Tamrakar A, Ali S, Yu Q, Liu J, Javed O, Divakaran A, Cheng H, Sawhney H (2012) Evaluation of low-level features and their combinations for complex event detection in open source videos. In: Proceedings of the 25th IEEE conference on computer vision and pattern recognition, pp 3681–3688

  34. Tang K, Yao B, Fei-Fei L, Koller D (2013) Combining the right features for complex event recognition. In: Proceedings of the 16th IEEE international conference on computer vision, pp 2696–2703

  35. Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li LJ (2015) The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817 1(8)

  36. Tzelepis C, Gkalelis N, Mezaris V, Kompatsiaris I (2013) Improving event detection using related videos and relevance degree support vector machines. In: Proceedings of the 21st ACM international conference on multimedia, pp 673–676

  37. Tzelepis C, Mezaris V, Patras I (2016) Video event detection using kernel support vector machine with isotropic gaussian sample uncertainty (ksvm-igsu). In; Proceedings of the 22nd international conference on multimedia modeling, pp 3–15

  38. Wang M, Hua XS, Yuan X, Song Y, Dai LR (2007) Optimizing multi-graph learning: towards a unified video annotation scheme. In: Proceedings of the 15th ACM international conference on multimedia, pp 862–871

  39. Wright J, Ganesh A, Rao S, Peng Y, Ma Y (2009) Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Proceedings of the 23rd annual conference on neural information processing systems, pp 2080–2088

  40. Xia T, Tao D, Mei T, Zhang Y (2010) Multiview spectral embedding. IEEE Trans Syst Man Cybern Part B (Cybern) 40(6):1438–1446

    Article  Google Scholar 

  41. Xu Z, Yang Y, Hauptmann AG (2015) A discriminative cnn video representation for event detection. In: Proceedings of the 28th IEEE conference on computer vision and pattern recognition, pp 1798–1807

  42. Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann A G, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878

    Article  MathSciNet  Google Scholar 

  43. Yang Y, Zhuang Y, Xu D, Pan Y, Tao D, Maybank S (2009) Retrieval based interactive cartoon synthesis via unsupervised bi-distance metric learning. In: Proceedings of the 17th ACM international conference on multimedia, pp 311–320

  44. Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimed 15(3):572–581

    Article  Google Scholar 

  45. Yu SI, Xu Z, Ding D, Sze W, Vicente F, Lan Z, Cai Y, Rawat S, Schulam PF, Bahmani S et al (2012) Informedia e-lamp@ trecvid 2012: multimedia event detection and recounting (med and mer). In: Proceedings of NIST TRECVID 2012 Workshop

  46. Yu SI, Jiang L, Hauptmann A (2014) Instructional videos for unsupervised harvesting and learning of action examples. In: Proceedings of the 22nd ACM international conference on multimedia, pp 825–828

  47. Zhang D, Han J, Jiang L, Ye S, Chang X (2017) Revealing event saliency in unconstrained video collection. IEEE Trans Image Process 26(4):1746–1758

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is was supported in part by “The Fundamental Theory and Applications of Big Data with Knowledge Engineering” under the National Key Research and Development Program of China with grant Nos. 2016YFB1000903; Ministry of Education Innovation Research Team No. IRT_17R86; Project of China Knowledge Centre for Engineering Science and Technology; National Science Foundation of China under Grant Nos. 61502377.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huan Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Zheng, Q., Li, Z. et al. An efficient multi-feature SVM solver for complex event detection. Multimed Tools Appl 77, 3509–3532 (2018). https://doi.org/10.1007/s11042-017-5166-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5166-z

Keywords

Navigation