Skip to main content

Top-Down Cues for Event Recognition

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6494))

Abstract

How to fuse static and dynamic information is a key issue in event analysis. In this paper, we present a novel approach to combine appearance and motion information together through a top-down manner for event recognition in real videos. Unlike the conventional bottom-up way, attention can be focused volitionally on top-down signals derived from task demands. A video is represented by a collection of spatio-temporal features, called video words by quantizing the extracted spatio-temporal interest points (STIPs) from the video. We propose two approaches to build class specific visual or motion histograms for the corresponding features. One is using the probability of a class given a visual or motion word. High probability means more attention should be paid to this word. Moreover, in order to incorporate the negative information for each word, we propose to utilize the mutual information between each word and event label. High mutual information means high relevance between this word and the class label. Both methods not only can characterize two aspects of an event, but also can select the relevant words, which are all discriminative to the corresponding event. Experimental results on the TRECVID 2005 and the HOHA video corpus demonstrate that the mean average precision has been improved by using the proposed method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www-nlpir.nist.gov/projects/trecvid

  2. Wang, F., Jiang, Y.G., Ngo, C.W.: Video event detection using motion relativity and visual relatedness. In: Proceeding of the 16th ACM International Conference on Multimedia (2008)

    Google Scholar 

  3. Xu, D., Chang, S.F.: Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans. Pattern Analysis and Machine Intelligence 30 (2008)

    Google Scholar 

  4. Zhou, X., Zhuang, X., Yan, S., Chang, S.F., Hasegawa-Johnson, M., Huang, T.S.: Sift-bag kernel for video event analysis. In: Proceeding of the 16th ACM International Conference on Multimedia (2008)

    Google Scholar 

  5. Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  6. Dollr, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking ans Surveillance, pp. 65–72 (2005)

    Google Scholar 

  7. Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of ACM International Conference on Image and Video Retrieval, vol. 46 (2007)

    Google Scholar 

  8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)

    Article  Google Scholar 

  9. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)

    Google Scholar 

  10. Paul Scovanner, S.A., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceeding of the 15th ACM International Conference on Multimedia (2007)

    Google Scholar 

  11. Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: International Conference on Computer Vision, pp. 166–173 (2005)

    Google Scholar 

  12. Zhang, Z., Hu, Y., Chan, S., Chia, L.-T.: Motion Context: A New Representation for Human Action Recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 817–829. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: IEEE Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  14. Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, pp. 726–733

    Google Scholar 

  15. Khan, F.S., van de Weijer, J., Vanrell, M.: Top-down color attention for object recognition. In: International Conference on Computer Vision (2009)

    Google Scholar 

  16. Liu, J., Shah, M.: Learning human actions via information maximization. In: IEEE Conference on Computer Vision and Pattern Recogntion, pp. 1996–2003 (2008)

    Google Scholar 

  17. Yuan, J., Liu, Z., Wu, Y.: Discriminative subvolume search for efficient action detection. In: IEEE Conference on Computer Vision and Pattern Recogntion, pp. 2442–2449 (2009)

    Google Scholar 

  18. Niebles, J.C., Wang, H., Fei-fei, L.: Unsupervised learning of human action categories using spatial temporal words. In: Proceedings of British Machine Vision Conference, pp. 299–318 (2006)

    Google Scholar 

  19. Chen, X., Zelinsky, G.J.: Real-world visual search is dominated by top-down guidance. Vision Research 46, 4118–4133 (2006)

    Article  Google Scholar 

  20. DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia. Revison of lscom event/activity annotations Columbia University ADVENT Technical Report (2006)

    Google Scholar 

  21. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, L., Yuan, C., Hu, W., Li, B. (2011). Top-Down Cues for Event Recognition. In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6494. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19318-7_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19318-7_54

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19317-0

  • Online ISBN: 978-3-642-19318-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics