Multimedia Tools and Applications

, Volume 77, Issue 3, pp 3509–3532 | Cite as

An efficient multi-feature SVM solver for complex event detection

  • Huan Liu
  • Qinghua Zheng
  • Zhihui Li
  • Tao Qin
  • Lei Zhu
Article
  • 92 Downloads

Abstract

Multimedia event detection (MED) has become one of the most important visual content analysis tools as the rapid growth of the user generated videos on the Internet. Generally, multimedia data is represented by multiple features and it is difficult to gain better performance for complex event detection with only single feature. However, how to fuse different features effectively is the crucial problem for MED with multiple features. Meanwhile, exploiting multiple features simultaneously in the large-scale scenarios always produces a heavy computational burden. To address these two issues, we propose a self-adaptive multi-feature learning framework with efficient Support Vector Machine (SVM) solver for complex event detection in this paper. Our model is able to utilize multiple features reasonably with an adaptively weighted linear combination manner, which is simple yet effective, according to the various impact that different features on a specific event. In order to mitigate the expensive computational cost, we employ a fast primal SVM solver in the proposed alternating optimization algorithm to obtain the approximate solution with gradient descent method. Extensive experiment results over standard datasets of TRECVID MEDTest 2013 and 2014 demonstrate the effectiveness and superiority of the proposed framework on complex event detection.

Keywords

Multimedia event detection Multi-feature learning SVM solver 

Notes

Acknowledgments

This work is was supported in part by “The Fundamental Theory and Applications of Big Data with Knowledge Engineering” under the National Key Research and Development Program of China with grant Nos. 2016YFB1000903; Ministry of Education Innovation Research Team No. IRT_17R86; Project of China Knowledge Centre for Engineering Science and Technology; National Science Foundation of China under Grant Nos. 61502377.

References

  1. 1.
    Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on image and video retrieval, pp 401–408Google Scholar
  2. 2.
    Chang X, Yang Y (2016) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst.  https://doi.org/10.1109/TNNLS.2016.2582746
  3. 3.
    Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: Proceedings of the 28th AAAI conference on artificial intelligence, pp 1171–1177Google Scholar
  4. 4.
    Chang X, Yang Y, Xing EP, Yu YL (2015) Complex event detection using semantic saliency and nearly-isotonic svm. In: Proceedings of the 32nd international conference on machine learning, pp 1348–1357Google Scholar
  5. 5.
    Chang X, Yang Y, Long G, Zhang C, Hauptmann AG (2016) Dynamic concept composition for zero-example event detection. In: Proceedings of the 30th AAAI conference on artificial intelligence, pp 3464–3470Google Scholar
  6. 6.
    Chang X, Ma Z, Lin M, Yang Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920MathSciNetCrossRefGoogle Scholar
  7. 7.
    Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann A G (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197CrossRefGoogle Scholar
  8. 8.
    Chang X, Yu Y L, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39 (8):1617–1632CrossRefGoogle Scholar
  9. 9.
    Chen MY, Hauptmann A (2009) Mosift: recognizing human actions in surveillance videos. Tech. rep. CMU-CS-09-161, Carnegie Mellon UniversityGoogle Scholar
  10. 10.
    Cortes C, Mohri M, Rostamizadeh A (2010) Two-stage learning kernel algorithms. In: Proceedings of the 27th international conference on machine learning, pp 239–246Google Scholar
  11. 11.
    Coṡar S, Donatiello G, Bogorny V, Garate C, Alvares LO, Brémond F (2017) Toward abnormal trajectory and event detection in video surveillance. IEEE Trans Circ Syst Vid Technol 27(3):683–695CrossRefGoogle Scholar
  12. 12.
    Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  13. 13.
    Farquhar JD, Hardoon DR, Meng H, Shawe-Taylor J, Szedmak S (2005) Two view learning: Svm-2k, theory and practice. In: Proceedings of the 19th annual conference on neural information processing systems, pp 355–362Google Scholar
  14. 14.
    Gill PE, Robinson DP (2012) A primal-dual augmented lagrangian. Comput Optim Appl 51(1):1–25MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Gkalelis N, Mezaris V (2014) Video event detection using generalized subclass discriminant analysis and linear support vector machines. In: Proceedings of the 4th international conference on multimedia retrieval, p 25Google Scholar
  16. 16.
    Gönen M, Alpaydın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268MathSciNetMATHGoogle Scholar
  17. 17.
    Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th international conference on machine learning, pp 408–415Google Scholar
  18. 18.
    Izadinia H, Shah M (2012) Recognizing complex events using large margin joint low-level event model. In: Proceedings of the 10th European conference on computer vision, pp 430–444Google Scholar
  19. 19.
    Jiang L, Hauptmann AG, Xiang G (2012) Leveraging high-level and low-level features for multimedia event detection. In: Proceedings of the 20th ACM international conference on multimedia, pp 449–458Google Scholar
  20. 20.
    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the 27th IEEE conference on computer vision and pattern recognition, pp 1725– 1732Google Scholar
  21. 21.
    Kludas J, Bruno E, Marchand-Maillet S (2007) Information fusion in multimedia information retrieval. In: Proceedings of the 5th international workshop on adaptive multimedia retrieval, pp 147–159Google Scholar
  22. 22.
    Lan ZZ, Jiang L, Yu SI, Rawat S, Cai Y, Gao C, Xu S, Shen H, Li X, Wang Y et al (2013) Cmu-informedia at trecvid 2013 multimedia event detection. In: Proceedings of NIST TRECVID 2013 Workshop, vol 1(2), p 5Google Scholar
  23. 23.
    Lan ZZ, Bao L, Yu SI, Liu W, Hauptmann AG (2014) Multimedia classification and event detection using double fusion. Multimed Tools Appl 71 (1):333–347CrossRefGoogle Scholar
  24. 24.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 19th IEEE conference on computer vision and pattern recognition, vol 2, pp 2169–2178Google Scholar
  25. 25.
    Lin CJ, Weng RC, Keerthi SS (2008) Trust region newton method for logistic regression. J Mach Learn Res 9:627–650MathSciNetMATHGoogle Scholar
  26. 26.
    Ma Z, Chang X, Yang Y, Sebe N, Hauptmann A (2017) The many shades of negativity. IEEE Trans Multimed 19(7):1558–1568CrossRefGoogle Scholar
  27. 27.
    Nie F, Huang Y, Wang X, Huang H (2014) New primal svm solver with linear computational cost for big data. In: Proceedings of the 31th international conference on machine learning, pp II-505Google Scholar
  28. 28.
    Over P, Fiscus J, Sanders G, Joy D, Michel M, Awad G, Smeaton A, Kraaij W, Quénot G (2014) Trecvid 2014–an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of NIST TRECVID 2014 workshop, p 52Google Scholar
  29. 29.
    Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260Google Scholar
  30. 30.
    Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for svm. In: Proceedings of the 24th international conference on machine learning, pp 807–814Google Scholar
  31. 31.
    Snoek CG, Worring M, Smeulders AW (2005) Early versus late fusion in semantic video analysis. In: Proceedings of the 13th annual ACM international conference on multimedia, pp 399–402Google Scholar
  32. 32.
    Song J, Yang Y, Huang Z, Shen HT, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM international conference on multimedia, pp 423–432Google Scholar
  33. 33.
    Tamrakar A, Ali S, Yu Q, Liu J, Javed O, Divakaran A, Cheng H, Sawhney H (2012) Evaluation of low-level features and their combinations for complex event detection in open source videos. In: Proceedings of the 25th IEEE conference on computer vision and pattern recognition, pp 3681–3688Google Scholar
  34. 34.
    Tang K, Yao B, Fei-Fei L, Koller D (2013) Combining the right features for complex event recognition. In: Proceedings of the 16th IEEE international conference on computer vision, pp 2696–2703Google Scholar
  35. 35.
    Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li LJ (2015) The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817 1(8)
  36. 36.
    Tzelepis C, Gkalelis N, Mezaris V, Kompatsiaris I (2013) Improving event detection using related videos and relevance degree support vector machines. In: Proceedings of the 21st ACM international conference on multimedia, pp 673–676Google Scholar
  37. 37.
    Tzelepis C, Mezaris V, Patras I (2016) Video event detection using kernel support vector machine with isotropic gaussian sample uncertainty (ksvm-igsu). In; Proceedings of the 22nd international conference on multimedia modeling, pp 3–15Google Scholar
  38. 38.
    Wang M, Hua XS, Yuan X, Song Y, Dai LR (2007) Optimizing multi-graph learning: towards a unified video annotation scheme. In: Proceedings of the 15th ACM international conference on multimedia, pp 862–871Google Scholar
  39. 39.
    Wright J, Ganesh A, Rao S, Peng Y, Ma Y (2009) Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Proceedings of the 23rd annual conference on neural information processing systems, pp 2080–2088Google Scholar
  40. 40.
    Xia T, Tao D, Mei T, Zhang Y (2010) Multiview spectral embedding. IEEE Trans Syst Man Cybern Part B (Cybern) 40(6):1438–1446CrossRefGoogle Scholar
  41. 41.
    Xu Z, Yang Y, Hauptmann AG (2015) A discriminative cnn video representation for event detection. In: Proceedings of the 28th IEEE conference on computer vision and pattern recognition, pp 1798–1807Google Scholar
  42. 42.
    Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann A G, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878MathSciNetCrossRefGoogle Scholar
  43. 43.
    Yang Y, Zhuang Y, Xu D, Pan Y, Tao D, Maybank S (2009) Retrieval based interactive cartoon synthesis via unsupervised bi-distance metric learning. In: Proceedings of the 17th ACM international conference on multimedia, pp 311–320Google Scholar
  44. 44.
    Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimed 15(3):572–581CrossRefGoogle Scholar
  45. 45.
    Yu SI, Xu Z, Ding D, Sze W, Vicente F, Lan Z, Cai Y, Rawat S, Schulam PF, Bahmani S et al (2012) Informedia e-lamp@ trecvid 2012: multimedia event detection and recounting (med and mer). In: Proceedings of NIST TRECVID 2012 WorkshopGoogle Scholar
  46. 46.
    Yu SI, Jiang L, Hauptmann A (2014) Instructional videos for unsupervised harvesting and learning of action examples. In: Proceedings of the 22nd ACM international conference on multimedia, pp 825–828Google Scholar
  47. 47.
    Zhang D, Han J, Jiang L, Ye S, Chang X (2017) Revealing event saliency in unconstrained video collection. IEEE Trans Image Process 26(4):1746–1758MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Huan Liu
    • 1
  • Qinghua Zheng
    • 1
  • Zhihui Li
    • 2
  • Tao Qin
    • 1
  • Lei Zhu
    • 3
  1. 1.MOEKLINNS Lab, Department of Computer Science and TechnologyXi’an Jiaotong UniversityXi’anChina
  2. 2.Beijing Etrol Technologies Co., Ltd.BeijingChina
  3. 3.School of Information Technology and Electrical EngineeringUniversity of QueenslandBrisbaneAustralia

Personalised recommendations