Skip to main content

Resource Constrained Multimedia Event Detection

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8325))

Abstract

We present a study comparing the cost and efficiency tradeoffs of multiple features for multimedia event detection. Low-level as well as semantic features are a critical part of contemporary multimedia and computer vision research. Arguably, combinations of multiple feature sets have been a major reason for recent progress in the field, not just as a low dimensional representations of multimedia data, but also as a means to semantically summarize images and videos. However, their efficacy for complex event recognition in unconstrained videos on standardized datasets has not been systematically studied. In this paper, we evaluate the accuracy and contribution of more than 10 multi-modality features, including semantic and low-level video representations, using two newly released NIST TRECVID Multimedia Event Detection (MED) open source datasets, i.e. MEDTEST and KINDREDTEST, which contain more than 1000 hours of videos. Contrasting multiple performance metrics, such as average precision, probability of missed detection and minimum normalized detection cost, we propose a framework to balance the trade-off between accuracy and computational cost. This study provides an empirical foundation for selecting feature sets that are capable of dealing with large-scale data with limited computational resources and are likely to produce superior multimedia event detection accuracy. This framework also applies to other resource limited multimedia analyses such as selecting/fusing multiple classifiers and different representations of each feature set.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www.psc.edu/index.php/computing-resources/blacklight

  2. Bao, L., Yu, S.-I., Lan, Z.Z., Overwijk, A., Jin, Q., Langner, B., Garbus, M., Burger, S., Metze, F., Hauptmann, A.: Informedia@ trecvid 2011. In: TRECVID 2011 (2011)

    Google Scholar 

  3. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Chen, M.-Y., Hauptmann, A.: Mosift: Recognizing human actions in surveillance videos. CMU-CS-09-161 (2009)

    Google Scholar 

  5. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893. IEEE (2005)

    Google Scholar 

  6. Ebadollahi, S., Chang, S.-F., Xie, L., Smith John, R.: Visual event detection using multi-dimensional concept semantics. In: ICME, pp. 881–884 (2006)

    Google Scholar 

  7. Jiang, Y.-G.: Super: Towards real-time event recognition in internet videos. In: ICMR, p. 7. ACM (2012)

    Google Scholar 

  8. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)

    Google Scholar 

  9. Lan, Z.-z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Double fusion for multimedia event detection. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 173–185. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Lan, Z.-Z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Multimedia classification and event detection using double fusion. Multimedia Tools and Applications, 1–15 (2013)

    Google Scholar 

  11. Laptev, I.: On space-time interest points. IJCV 64(2-3), 107–123 (2005)

    Article  Google Scholar 

  12. Li, L.-J., Su, H., Fei-Fei, L., Xing, E.P.: Object bank: A high-level image representation for scene classification & semantic feature sparsification. In: NIPS, pp. 1378–1386 (2010)

    Google Scholar 

  13. Liu, J., Yu, Q., Javed, O., Ali, S., Tamrakar, A., Divakaran, A., Cheng, H., Sawhney, H.S.: Video event recognition using concept attributes. In: WACV, pp. 339–346 (2013)

    Google Scholar 

  14. Merler, M., Member, S., Huang, B., Xie, L., Hua, G.: Semantic Model vectors for complex video event recognition. IEEE Trans. on Multimedia 14(1), 88–101 (2012)

    Article  Google Scholar 

  15. Moosmann, F., Nowak, E., Jurie, F.: Randomized clustering forests for image classification. PAMI 30(9), 1632–1646 (2008)

    Article  Google Scholar 

  16. Over, P., Awad, G., Michel, M., Fiscus, J., Sanders, G., Shaw, B., Kraaij, W., Smeaton, A.F., Quéenot, G.: Trecvid 2012 – an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID. NIST, USA (2012)

    Google Scholar 

  17. Tamrakar, A., Ali, S., Yu, Q., Liu, J., Javed, O., Divakaran, A., Cheng, H., Sawhney, H., International Sarnoff, S.R.I.: Evaluation of low-level leatures and their combinations for complex event detection in open source videos. In: CVPR, pp. 3681–3688 (2012)

    Google Scholar 

  18. Van De Sande, K.E.A., Gevers, T., Cees, G.M.S.: Evaluating color descriptors for object and scene recognition. PAMI 32(9), 1582–1596 (2010)

    Article  Google Scholar 

  19. Wang, H., Klaser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176. IEEE (2011)

    Google Scholar 

  20. Yang, J., Jiang, Y.-G., Hauptmann, A.G., Ngo, C.-W.: Evaluating bag-of-visual-words representations in scene classification. In: Workshop on ICMR, pp. 197–206. ACM (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Lan, ZZ., Yang, Y., Ballas, N., Yu, SI., Haputmann, A. (2014). Resource Constrained Multimedia Event Detection. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds) MultiMedia Modeling. MMM 2014. Lecture Notes in Computer Science, vol 8325. Springer, Cham. https://doi.org/10.1007/978-3-319-04114-8_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04114-8_33

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04113-1

  • Online ISBN: 978-3-319-04114-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics