Resource Constrained Multimedia Event Detection

Lan, Zhen-Zhong; Yang, Yi; Ballas, Nicolas; Yu, Shoou-I; Haputmann, Alexander

doi:10.1007/978-3-319-04114-8_33

Zhen-Zhong Lan²²,
Yi Yang²²,
Nicolas Ballas²²,
Shoou-I Yu²² &
…
Alexander Haputmann²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8325))

Included in the following conference series:

International Conference on Multimedia Modeling

3357 Accesses
2 Citations

Abstract

We present a study comparing the cost and efficiency tradeoffs of multiple features for multimedia event detection. Low-level as well as semantic features are a critical part of contemporary multimedia and computer vision research. Arguably, combinations of multiple feature sets have been a major reason for recent progress in the field, not just as a low dimensional representations of multimedia data, but also as a means to semantically summarize images and videos. However, their efficacy for complex event recognition in unconstrained videos on standardized datasets has not been systematically studied. In this paper, we evaluate the accuracy and contribution of more than 10 multi-modality features, including semantic and low-level video representations, using two newly released NIST TRECVID Multimedia Event Detection (MED) open source datasets, i.e. MEDTEST and KINDREDTEST, which contain more than 1000 hours of videos. Contrasting multiple performance metrics, such as average precision, probability of missed detection and minimum normalized detection cost, we propose a framework to balance the trade-off between accuracy and computational cost. This study provides an empirical foundation for selecting feature sets that are capable of dealing with large-scale data with limited computational resources and are likely to produce superior multimedia event detection accuracy. This framework also applies to other resource limited multimedia analyses such as selecting/fusing multiple classifiers and different representations of each feature set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://www.psc.edu/index.php/computing-resources/blacklight
Bao, L., Yu, S.-I., Lan, Z.Z., Overwijk, A., Jin, Q., Langner, B., Garbus, M., Burger, S., Metze, F., Hauptmann, A.: Informedia@ trecvid 2011. In: TRECVID 2011 (2011)
Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)
Chapter Google Scholar
Chen, M.-Y., Hauptmann, A.: Mosift: Recognizing human actions in surveillance videos. CMU-CS-09-161 (2009)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Ebadollahi, S., Chang, S.-F., Xie, L., Smith John, R.: Visual event detection using multi-dimensional concept semantics. In: ICME, pp. 881–884 (2006)
Google Scholar
Jiang, Y.-G.: Super: Towards real-time event recognition in internet videos. In: ICMR, p. 7. ACM (2012)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
Google Scholar
Lan, Z.-z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Double fusion for multimedia event detection. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 173–185. Springer, Heidelberg (2012)
Chapter Google Scholar
Lan, Z.-Z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Multimedia classification and event detection using double fusion. Multimedia Tools and Applications, 1–15 (2013)
Google Scholar
Laptev, I.: On space-time interest points. IJCV 64(2-3), 107–123 (2005)
Article Google Scholar
Li, L.-J., Su, H., Fei-Fei, L., Xing, E.P.: Object bank: A high-level image representation for scene classification & semantic feature sparsification. In: NIPS, pp. 1378–1386 (2010)
Google Scholar
Liu, J., Yu, Q., Javed, O., Ali, S., Tamrakar, A., Divakaran, A., Cheng, H., Sawhney, H.S.: Video event recognition using concept attributes. In: WACV, pp. 339–346 (2013)
Google Scholar
Merler, M., Member, S., Huang, B., Xie, L., Hua, G.: Semantic Model vectors for complex video event recognition. IEEE Trans. on Multimedia 14(1), 88–101 (2012)
Article Google Scholar
Moosmann, F., Nowak, E., Jurie, F.: Randomized clustering forests for image classification. PAMI 30(9), 1632–1646 (2008)
Article Google Scholar
Over, P., Awad, G., Michel, M., Fiscus, J., Sanders, G., Shaw, B., Kraaij, W., Smeaton, A.F., Quéenot, G.: Trecvid 2012 – an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID. NIST, USA (2012)
Google Scholar
Tamrakar, A., Ali, S., Yu, Q., Liu, J., Javed, O., Divakaran, A., Cheng, H., Sawhney, H., International Sarnoff, S.R.I.: Evaluation of low-level leatures and their combinations for complex event detection in open source videos. In: CVPR, pp. 3681–3688 (2012)
Google Scholar
Van De Sande, K.E.A., Gevers, T., Cees, G.M.S.: Evaluating color descriptors for object and scene recognition. PAMI 32(9), 1582–1596 (2010)
Article Google Scholar
Wang, H., Klaser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176. IEEE (2011)
Google Scholar
Yang, J., Jiang, Y.-G., Hauptmann, A.G., Ngo, C.-W.: Evaluating bag-of-visual-words representations in scene classification. In: Workshop on ICMR, pp. 197–206. ACM (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Zhen-Zhong Lan, Yi Yang, Nicolas Ballas, Shoou-I Yu & Alexander Haputmann

Authors

Zhen-Zhong Lan
View author publications
You can also search for this author in PubMed Google Scholar
Yi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Ballas
View author publications
You can also search for this author in PubMed Google Scholar
Shoou-I Yu
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Haputmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, Dublin City University, Dublin 9, Ireland
Cathal Gurrin
Fakultät IV für Elektrotechnik und Informatik, Technische Universität Berlin / DAI-Labor, 10587, Berlin, Germany
Frank Hopfgartner
Department of Information and Computing Sciences, Universiteit Utrecht, 3584 CC, Utrecht, The Netherlands
Wolfgang Hurst
UiT The Arctic University of Norway, 9019, Tromsø, Norway
Håvard Johansen
Singapore University of Technology and Design, Singapore
Hyowon Lee
School of Electrical Engineering, Dublin City University, Ireland
Noel O’Connor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lan, ZZ., Yang, Y., Ballas, N., Yu, SI., Haputmann, A. (2014). Resource Constrained Multimedia Event Detection. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds) MultiMedia Modeling. MMM 2014. Lecture Notes in Computer Science, vol 8325. Springer, Cham. https://doi.org/10.1007/978-3-319-04114-8_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-04114-8_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04113-1
Online ISBN: 978-3-319-04114-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics