Top-Down Cues for Event Recognition

Li, Li; Yuan, Chunfeng; Hu, Weiming; Li, Bing

doi:10.1007/978-3-642-19318-7_54

Top-Down Cues for Event Recognition

Li Li^19,20,
Chunfeng Yuan¹⁹,
Weiming Hu¹⁹ &
…
Bing Li¹⁹

Conference paper

2880 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6494))

Abstract

How to fuse static and dynamic information is a key issue in event analysis. In this paper, we present a novel approach to combine appearance and motion information together through a top-down manner for event recognition in real videos. Unlike the conventional bottom-up way, attention can be focused volitionally on top-down signals derived from task demands. A video is represented by a collection of spatio-temporal features, called video words by quantizing the extracted spatio-temporal interest points (STIPs) from the video. We propose two approaches to build class specific visual or motion histograms for the corresponding features. One is using the probability of a class given a visual or motion word. High probability means more attention should be paid to this word. Moreover, in order to incorporate the negative information for each word, we propose to utilize the mutual information between each word and event label. High mutual information means high relevance between this word and the class label. Both methods not only can characterize two aspects of an event, but also can select the relevant words, which are all discriminative to the corresponding event. Experimental results on the TRECVID 2005 and the HOHA video corpus demonstrate that the mean average precision has been improved by using the proposed method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://www-nlpir.nist.gov/projects/trecvid
Wang, F., Jiang, Y.G., Ngo, C.W.: Video event detection using motion relativity and visual relatedness. In: Proceeding of the 16th ACM International Conference on Multimedia (2008)
Google Scholar
Xu, D., Chang, S.F.: Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans. Pattern Analysis and Machine Intelligence 30 (2008)
Google Scholar
Zhou, X., Zhuang, X., Yan, S., Chang, S.F., Hasegawa-Johnson, M., Huang, T.S.: Sift-bag kernel for video event analysis. In: Proceeding of the 16th ACM International Conference on Multimedia (2008)
Google Scholar
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Dollr, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking ans Surveillance, pp. 65–72 (2005)
Google Scholar
Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of ACM International Conference on Image and Video Retrieval, vol. 46 (2007)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
Google Scholar
Paul Scovanner, S.A., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceeding of the 15th ACM International Conference on Multimedia (2007)
Google Scholar
Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: International Conference on Computer Vision, pp. 166–173 (2005)
Google Scholar
Zhang, Z., Hu, Y., Chan, S., Chia, L.-T.: Motion Context: A New Representation for Human Action Recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 817–829. Springer, Heidelberg (2008)
Chapter Google Scholar
Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, pp. 726–733
Google Scholar
Khan, F.S., van de Weijer, J., Vanrell, M.: Top-down color attention for object recognition. In: International Conference on Computer Vision (2009)
Google Scholar
Liu, J., Shah, M.: Learning human actions via information maximization. In: IEEE Conference on Computer Vision and Pattern Recogntion, pp. 1996–2003 (2008)
Google Scholar
Yuan, J., Liu, Z., Wu, Y.: Discriminative subvolume search for efficient action detection. In: IEEE Conference on Computer Vision and Pattern Recogntion, pp. 2442–2449 (2009)
Google Scholar
Niebles, J.C., Wang, H., Fei-fei, L.: Unsupervised learning of human action categories using spatial temporal words. In: Proceedings of British Machine Vision Conference, pp. 299–318 (2006)
Google Scholar
Chen, X., Zelinsky, G.J.: Real-world visual search is dominated by top-down guidance. Vision Research 46, 4118–4133 (2006)
Article Google Scholar
DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia. Revison of lscom event/activity annotations Columbia University ADVENT Technical Report (2006)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, China
Li Li, Chunfeng Yuan, Weiming Hu & Bing Li
Radio, Film and Television Design and Research Institute, China
Li Li

Authors

Li Li
View author publications
You can also search for this author in PubMed Google Scholar
Chunfeng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Weiming Hu
View author publications
You can also search for this author in PubMed Google Scholar
Bing Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technion – Israel Institute of Technology, Department of Computer Science, 32000, Haifa, Israel
Ron Kimmel
The University of Auckland, 37 Kohimarama Road , Mission Bay, 1071, Auckland, New Zealand
Reinhard Klette
National Institute of Informatics, Chiyoda, 1018430, Tokyo, Japan
Akihiro Sugimoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Yuan, C., Hu, W., Li, B. (2011). Top-Down Cues for Event Recognition. In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6494. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19318-7_54

Download citation

DOI: https://doi.org/10.1007/978-3-642-19318-7_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19317-0
Online ISBN: 978-3-642-19318-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics