Multimedia Tools and Applications

, Volume 74, Issue 2, pp 505–521 | Cite as

Max-margin adaptive model for complex video pattern recognition

  • Litao YuEmail author
  • Jie Shao
  • Xin-Shun Xu
  • Heng Tao Shen


Patternrecognitionmodels are usually used in a variety of applications ranging from video concept annotation to event detection. In this paper we propose a new framework called the max-margin adaptive (MMA) model for complex video pattern recognition, which can utilize a large number of unlabeled videos to assist the model training. The MMA model considers the data distribution consistence between labeled training videos and unlabeled auxiliary ones from the statistical perspective by learning an optimal mapping function which also broadens the margin between positive labeled videos and negative labeled videos to improve the robustness of the model. The experiments are conducted on two public datasets including CCV for video object/event detection and HMDB for action recognition. Our results demonstrate that the proposed MMA model is very effective on complex video pattern recognition tasks, and outperforms the state-of-the-art algorithms.


Video pattern recognition Max-margin adaptive model Event detection 


  1. 1.
    Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimedia Tools Appl 51(1):279–302CrossRefGoogle Scholar
  2. 2.
    Blitzer J, Crammer K, Kulesza A, Pereira F, Wortman J (2007) Learning bounds for domain adaptation. In: NIPS, pp 129–136Google Scholar
  3. 3.
    Borgwardt KM, Gretton A, Rasch MJ, Kriegel HP, Schlkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):49–57CrossRefGoogle Scholar
  4. 4.
    Brefeld U, Gärtner T, Scheffer T, Wrobel S (2006) Efficient co-regularised least squares regression. In: ICML, pp 137–144Google Scholar
  5. 5.
    Charles J, Pfister T, Magee D, Hogg D, Zisserman A (2013) Domain adaptation for upper body pose tracking in signed tv broadcasts. In: Proceedings of the British machine vision conferenceGoogle Scholar
  6. 6.
    Chen B, Lam W, Tsang IW, Wong TL (2013) Discovering low-rank shared concept space for adapting text mining models. IEEE Trans Pattern Anal Mach Intell 35(6):1284–1297CrossRefGoogle Scholar
  7. 7.
    Cortes C, Mohri M, Rostamizadeh A (2009) L2 regularization for learning kernels. In: UAI, pp 109–116Google Scholar
  8. 8.
    Diane C, Feuz KD, Krishnan NC (2013) Transfer learning for activity recognition: a survey. Knowl Inf Syst 36(3):537–556CrossRefGoogle Scholar
  9. 9.
    Duan L, Tsang I, Xu D (2012) Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3):465–479CrossRefGoogle Scholar
  10. 10.
    Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: ICCV, pp 221–228Google Scholar
  11. 11.
    Jiang YG, Ye G, Chang SF, Ellis D, Loui AC (2011) Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: ICMR, pp 29:1–29:8Google Scholar
  12. 12.
    Jiang YG, Bhattacharya S, Chang SF, Shah M (2013) High-level event recognition in unconstrained videos. Int J Multimedia Inf Retrieval 2(2):73–101CrossRefGoogle Scholar
  13. 13.
    Jie L, Tommasi T, Caputo B (2011) Multiclass transfer learning from unconstrained priors. In: Computer Vision (ICCV), pp 1863–1870Google Scholar
  14. 14.
    Joachims T (1999) Transductive inference for text classification using support vector machines. In: ICML, vol 99, pp 200–209Google Scholar
  15. 15.
    Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: ICCVGoogle Scholar
  16. 16.
    Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: AAAIGoogle Scholar
  17. 17.
    Liang F, Tang S, Wang Y, Han Q, Li J (2013) A sparse coding based transfer learning framework for pedestrian detection. In: Advances in multimedia modeling, vol 7733, pp 272-282Google Scholar
  18. 18.
    Lin W, Sun MT, Poovendran R, Zhang Z (2010) Group event detection with a varying number of group members for video surveillance. IEEE Trans Circ Syst Video Technol 20(8):1057–1067CrossRefGoogle Scholar
  19. 19.
    Lin YY, Liu TL, Fuh CS (2011) Multiple kernel learning for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 33(6):1147–1160CrossRefGoogle Scholar
  20. 20.
    Ma Z, Yang Y, Cai Y, Sebe N, Hauptmann AG (2012) Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In: ACM multimedia, pp 469–478Google Scholar
  21. 21.
    Ma Z, Yang Y, Sebe N, Zheng K, Hauptmann A (2013a) Multimedia event detection using a classifier-specific intermediate representation. IEEE Trans 15(7):1628–1637Google Scholar
  22. 22.
    Ma Z, Yang Y, Xu Z, Yan S, Sebe N, Hauptmann A (2013b) Complex event detection via multi-source video attributes. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR), pp 2627–2633Google Scholar
  23. 23.
    Merler M, Huang B, Xie L, Hua G, Natsev A (2012) Semantic model vectors for complex video event recognition. IEEE Trans Multimed 14(1):88–101CrossRefGoogle Scholar
  24. 24.
    Natarajan P, Wu S, Vitaladevuni S, Zhuang X, Tsakalidis S, Park U, Prasad R (2012) Multimodal feature fusion for robust event detection in web videos. In: Computer vision and pattern recognition (CVPR), pp 1298–1305Google Scholar
  25. 25.
    Obozinski G, Taskar B, Jordan M (2010) Joint covariate selection and joint subspace selection for multiple classification problems. Stat Comput 20(2):231–252CrossRefMathSciNetGoogle Scholar
  26. 26.
    Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRefGoogle Scholar
  27. 27.
    Pan SJ, Ni X, Sun JT, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: WWW, pp 751–760Google Scholar
  28. 28.
    Quattoni A, Collins M, Darrell T (2008) Transfer learning for image classification with sparse prototype representations. In: Computer vision and pattern recognition (CVPR), pp 1–8Google Scholar
  29. 29.
    Rohrbach M, Ebert S, Schiele B (2013) Transfer learning in a transductive setting. In: Burges C, Bottou L, Welling M, Ghahramani Z, Weinberger K (eds) Advances in neural information processing systems, vol 26, pp 46–54Google Scholar
  30. 30.
    Sugiyama M, Id T, Nakajima S, Sese J (2010) Semi-supervised local fisher discriminant analysis for dimensionality reduction. Mach Learn 78(1–2):35–61CrossRefMathSciNetGoogle Scholar
  31. 31.
    Tamrakar A, Ali S, Yu Q, Liu J, Javed O, Divakaran A, Cheng H, Sawhney H (2012) Evaluation of low-level features and their combinations for complex event detection in open source videos. In: Computer vision and pattern recognition (CVPR), pp 3681–3688Google Scholar
  32. 32.
    Tang K, Fei-Fei L, Koller D (2012) Learning latent temporal structure for complex event detection. In: Computer vision and pattern recognition (CVPR), pp 1250–1257Google Scholar
  33. 33.
    Tjondronegoro D, Chen YP (2010) Knowledge-discounted event detection in sports video. IEEE Trans Syst, Man Cybern, Part A: Syst Hum 40(5):1009–1024CrossRefGoogle Scholar
  34. 34.
    Van Erp M, Vuurpijl L, Schomaker L (2002) An overview and comparison of voting methods for pattern recognition. In: Eighth international workshop on frontiers in handwriting recognition, pp 195–200Google Scholar
  35. 35.
    Wang S, Ma Z, Yang Y, Li X, Pang C, Hauptmann A (2014) Semi-supervised multiple feature analysis for action recognition. IEEE Trans Multimed 16(2):289–298CrossRefGoogle Scholar
  36. 36.
    Xiao M, Guo Y (2012) Semi-supervised kernel matching for domain adaptation. In: AAAIGoogle Scholar
  37. 37.
    Xu Z, Yang Y, Tsang I, Sebe N (2013) Feature weighting via optimal thresholding for video analysis. In: The IEEE international conference on computer vision (ICCV)Google Scholar
  38. 38.
    Yang J, Yan R, Hauptmann AG (2007) Cross-domain video concept detection using adaptive svms. In: ACM Proceedings of the 15th international conference on Multimedia, pp 188–197Google Scholar
  39. 39.
    Yang Y, Shah M (2012) Complex events detection using data-driven concepts. In: Computer vision–ECCV 2012. Springer, pp 722–735Google Scholar
  40. 40.
    Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann A (2013a) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimed 15(3):572–581CrossRefGoogle Scholar
  41. 41.
    Yang Y, Yang Y, Shen HT (2013b) Effective transfer tagging from image to video. ACM Trans Multimed Comput Commun, Appl 9(2):1–20CrossRefGoogle Scholar
  42. 42.
    Yao Y, Doretto G (2010) Boosting for transfer learning with multiple sources. In: Computer vision and pattern recognition (CVPR), pp 1855–1862Google Scholar
  43. 43.
    Younessian E, Quinn M, Mitamura T, Hauptmann A (2013) Multimedia event detection using visual concept signatures. Proc SPIE 8667(1)Google Scholar
  44. 44.
    Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Analysis Mach Intell 31(1):39–58CrossRefGoogle Scholar
  45. 45.
    Zhang T, Xu C, Zhu G, Liu S, Lu H (2010) A generic framework for event detection in various video domains. In: ACM multimedia, pp 103–112Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Litao Yu
    • 1
    Email author
  • Jie Shao
    • 2
  • Xin-Shun Xu
    • 3
  • Heng Tao Shen
    • 1
  1. 1.School of Information Technology and Electrical EngineeringThe University of QueenslandBrisbaneAustralia
  2. 2.Department of Computer ScienceNational University of SingaporeSingaporeSingapore
  3. 3.School of Computer Science and TechnologyShandong UniversityJinanPeople’s Republic of China

Personalised recommendations