Abstract
In this paper, we propose velocity pyramid for multimedia event detection. Recently, spatial pyramid matching is proposed to introduce coarse geometric information into Bag of Features framework, and is effective for static image recognition and detection. In video, not only spatial information but also temporal information, which represents its dynamic nature, is important. In order to fully utilize it, we propose velocity pyramid where video frames are divided into motional sub-regions. Our method is effective for detecting events characterized by their temporal patterns. Experiment on the dataset of MED (Multimedia Event Detection) has shown 10% improvement of performance by velocity pyramid than without this method. Further, when combined with spatial pyramid, velocity pyramid provides an extra 3% gains to the detection result.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
2013 TRECVID Multimedia Event Detection Track, http://www.nist.gov/itl/iad/mig/med13.cfm
Jiang, Y.G., Zeng, X., Ye, G., et al.: Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching. In: Proc. of TRECVID Workshop (2010)
Aly, R., McGuinness, K., et al.: AXES at TRECVid 2012. In: Proc. of TRECVID Workshop (2012)
Jiang, L.: Alexander G. Hauptmann, G. Xiang: Leveraging High-level and Low-level Features for Multimedia Event Detection. ACM Multimedia 12, 449–458 (2012)
Torralba, A., Oliva, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42(3), 145–175 (2001)
Dalal, N., Triggs, B., Schmid, C.: Human Detection Using Oriented Histograms of Flow and Appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning Realistic Human Actions from Movies. In: Proc. CVPR, pp. 1–8 (2008)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: Proc. CVPR, pp. 2169–2178 (2006)
Sun, C., Nevatia, R.: Large-scale Web Video Event Classification by use of Fisher Vectors. In: 2013 IEEE Workshop on Application of Computer Vision, pp. 15–22 (2013)
Viitaniemi, V., Laaksonen, J.: Spatial extensions to bag of visual words. In: Proc. CIVR. ACM (2009)
Inoue, N., Shinoda, K.: A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors. IEEE Transactions on Multimedia 14(4-2), 1196–1205 (2012)
Kamishima, Y., Inoue, N., Shinoda, K., Sato, S.: Multimedia Event Detection Using GMM Supervectors and SVMs. In: Proc. ICIP, Florida, pp. 3089–3092 (2012)
Yu, S., Xu, Z., Ding, D., Sze, W.: Informedia E-Lamp@TRECVID 2012. In: Proc. of TRECVID Workshop (2012)
Cheng, H., Liu, J., Ali, S., Javed, O.: SRI-Sarnoff AURORA System at TRECVID 2012 Multimedia Event Detection and Recounting. In: Proc. of TRECVID Workshop (2012)
Laptev, I.: On space-time interest points. IJCV 64, 107–123 (2005)
Wang, H., Klser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: Proc. CVPR, pp. 3169–3176 (2011)
Wang, F., Jiang, Y.G., Ngo, C.W.: Video Event Detection Using Motion Relativity and Visual Relatedness. In: Proc. ACM Multimedia, pp. 239–248 (2008)
Chen, M., Hauptmann, A.: MoSIFT: Recognizing Human Actions in Surveillance Videos. CMU-CS-09-161, Carnegie Mellon University (2009)
Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence Detection in Video using Computer Vision Techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011, Part II. LNCS, vol. 6855, pp. 332–339. Springer, Heidelberg (2011)
Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Liang, Z., Inoue, N., Shinoda, K. (2014). Event Detection by Velocity Pyramid. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds) MultiMedia Modeling. MMM 2014. Lecture Notes in Computer Science, vol 8325. Springer, Cham. https://doi.org/10.1007/978-3-319-04114-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-04114-8_30
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04113-1
Online ISBN: 978-3-319-04114-8
eBook Packages: Computer ScienceComputer Science (R0)