Skip to main content

Double Fusion for Multimedia Event Detection

  • Conference paper
Book cover Advances in Multimedia Modeling (MMM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7131))

Included in the following conference series:

Abstract

Multimedia Event Detection is a multimedia retrieval task with the goal of finding videos of a particular event in an internet video archive, given example videos and descriptions. We focus here on mining features of example videos to learn the most characteristic features, which requires a combination of multiple complementary types of features. Generally, early fusion and late fusion are two popular combination strategies. The former one fuses features before performing classification and the latter one combines output of classifiers from different features. In this paper, we introduce a fusion scheme named double fusion, which combines early fusion and late fusion together to incorporate their advantages. Results are reported on TRECVID MED 2010 and 2011 data sets. For MED 2010, we get a mean minimal normalized detection cost (MNDC) of 0.49, which exceeds the state of the art performance by more than 12 percent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: 8th ACM International Workshop on Multimedia Information Retrieval, MIR 2006 (2006)

    Google Scholar 

  2. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videios ’in the wild’. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR 2009 (2009)

    Google Scholar 

  3. Hauptmann, A., Yan, R., Lin, W., Christel, M., Wactlar, H.: Can high- level concepts fill the semantic gap in video Retrieval? A case study with broadcast news. IEEE Transaction on Multimedia 9(5), 958–966 (2007)

    Article  Google Scholar 

  4. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Computer Vision 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  5. Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Transactions on Multimedia (TMM 2008) 10(3), 437–446 (2008)

    Article  Google Scholar 

  6. Liu, J., Yang, Y., Shah, M.: Learning semantic visual vocabularies using diffusion distance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (2009)

    Google Scholar 

  7. Jiang, Y.G., Zeng, X.H., Chang, S.F., et al.: Columbia-UCF TRECVID 2010 multimedia event detection: combining multiple modalities, contextual concepts, and temporal matching. In: Proceeding TRECVID Workshop (2010)

    Google Scholar 

  8. Iyengar, G., Nock, H., Neti, C.: Discriminative model fusion for semantic concept detection and annotation in video. In: Proceedings of 11th Annual ACM International Conference Multimedia, MM 2003 (2003)

    Google Scholar 

  9. Snoek, C.G.M., Worringm, M., Smeulders, A.W.M.: Early versus late fusion in semantic video analysis. In: Proceedings of 13th Annual ACM International Conference Multimedia, MM 2005 (2005)

    Google Scholar 

  10. Li, H., Bao, L., Hauptmann, A., et al.: Informedia@ TRECVID 2010. In: Proceedings of TRECVID Workshop (2010)

    Google Scholar 

  11. Gehler, P., Nowozin, S.: On feature combination for multiclass object classification. In: Proceedings of International Conference Computer Vision, ICCV 2009 (2009)

    Google Scholar 

  12. Cortes, C., Mohri, M., Rostamizadeh, A.: L 2 regularization for learning kernels. In: Proceedings of Uncertainty Artitical Intelligence, UAI 2009 (2009)

    Google Scholar 

  13. Erp, M.V., Vuurpijl, L.G., Schomaker, L.: An overview and comparison of voting methods for pattern recognition. In: Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition, IWFHR-8 (2002)

    Google Scholar 

  14. Brefeld, U., Gaertner, T., Scheffer, T., Wrobel, S.: Efficient co-regularized least squares regression. In: Proceedings of the 23rd International Conference of Machine Learning, ICML 2006 (2006)

    Google Scholar 

  15. Ayache, S., Quénot, G., Gensel, J.: Classifier Fusion for SVM-Based Multimedia Semantic Indexing. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 494–504. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  16. van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluation of color descriptors for object and scene recognition. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008 (2008)

    Google Scholar 

  17. Chen, M.Y., Hauptmann, A.: MoSIFT: Recognition human actions in surveillance videos. Technological report, CMU-CS-09-161, Carnegie Mellon University (2009)

    Google Scholar 

  18. Laptev, I., Lindeberg, T.: Space-time interest points. In: Proceedings of International Conference Computer Vision, ICCV 2003 (2003)

    Google Scholar 

  19. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Computer Vision and Pattern Recognition 2006, CVPR 2006 (2006)

    Google Scholar 

  20. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV 2004) 60(2), 91–100 (2004)

    Article  Google Scholar 

  21. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001)

    Google Scholar 

  22. Vedaldi, A., Fulkerson, B.: VLFeat: An Open and Portable Library of Computer Vision Algorithms (2008)

    Google Scholar 

  23. Bernhard, S., Burges, C.J.C., Smola, A.J.: Advances in kernel methods: Support Vector Learning. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lan, Zz., Bao, L., Yu, SI., Liu, W., Hauptmann, A.G. (2012). Double Fusion for Multimedia Event Detection. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, CW., Andreopoulos, Y., Breiteneder, C. (eds) Advances in Multimedia Modeling. MMM 2012. Lecture Notes in Computer Science, vol 7131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27355-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27355-1_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27354-4

  • Online ISBN: 978-3-642-27355-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics