Skip to main content

A Grammar Based Method for Video Event Indexing and Retrieval

  • Chapter
Affective Computing and Intelligent Interaction

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 137))

  • 187 Accesses

Abstract

In this paper, we present a novel framework for indexing and retrieving video events. The framework has a hierarchical structural and based upon a grammatical model. We firstly define and detect some event primitives, and induce the parameters of the SCFG to describe the event sequences, by using Liang’s nonparametric model of HDP-SCFG and variational inference algorithm. The MES parser and viterbi algorithm is employed to acquire the parse tree, which is index and matched next. We mainly contribute in four aspects. 1) It is the first time that transplant nonparametric grammar, ISCFG, to the domain of video event retrieval. 2) The cross data link table structure of indexing is efficient enough for searching in real time. The computational complexity of locating a non-terminal at any time point is O(log). 3) The novel matching scheme allows people to search content of interest by input a sample of video segment. The measure of similarity works well because both the structure and content of the parse trees are took into account.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Xu, W.G., Lu, J.J., et al.: A unsupervised framework of video event analysis. To appear in Proceedings of the 2010 Second Pacific-Asica Conference on Knowledge Engineering and Software Engineering, Chongqing, China (December 2010)

    Google Scholar 

  2. Ivanov, Y., Bobick, A.: Recognition of visual activities and interactions by stochastic parsing. IEEE Transaction on Pattern Analysis and Machine Intelligence 22(8), 852–872 (2000)

    Article  Google Scholar 

  3. Moore, D., Essa, I.: Recognizing multitasked activities from video using stochastic context-free grammar. In: National Conference on Artificial Intelligence, pp. 770–776 (2002)

    Google Scholar 

  4. Stolcke: An efficient probabilistic context-free Parsing Algorithm that computes prefix probabilities. Computational Linguistics 21(2), 164–201 (1995)

    MathSciNet  Google Scholar 

  5. Liang, P., Jordan, M.I., Klein, D.: Probabilistic Grammars and Hierarchical Dirichlet Process. To appear in the Handbook of Applied Bayesian Analysis (2009)

    Google Scholar 

  6. Lawrence, N.D.: Variational inference in probabilistic models, PhD Thesis (2000)

    Google Scholar 

  7. Earley, J.: An efficient context-free parsing algorithm. Communication of the ACM 13(2), 94–102 (1970)

    Article  MATH  Google Scholar 

  8. Teh, Y.W., Jordan, M.I., Beal, M., Blei, D.: Hierarchical dirichlet processes. Journal of the American Statistical Association 101, 1566–1581 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  9. Lin, L., Gong, H., Wang, L.: Semantic event representation and recognition using syntactic attribute graph grammar. Pattern Recognition Letters 30, 180–186 (2009)

    Article  Google Scholar 

  10. Aggarwal, J.K., Cai, Q.: Human motion analysis: a review. Computer Vision and Image Understanding 73(3), 429–440 (1999)

    Article  Google Scholar 

  11. Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Transactions on Circuits, Systems and Video Technology 18(11), 1473–1488 (2008)

    Article  Google Scholar 

  12. Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviours. IEEE Transaction on Systems, Man and Cybernetic 34(3), 334–352 (2004)

    Article  Google Scholar 

  13. Morris, T., Trivedi, M.M.: A survey of vision-based trajectory learning and analysis for surveillance. IEEE Transaction on Circuits and Systems For Video Technology 18(8), 1114–1127 (2008)

    Article  Google Scholar 

  14. Xie, L., SunDaram, H., Cambell, M.: Event mining in multimedia streams. Proceeding of the IEEE 96(4), 623–647 (2008)

    Article  Google Scholar 

  15. Francois, A.R.J., Nevatia, R., Hobbs, J., Bolles, R.C.: Verl: An ontology framework for representing and annotating video events. IEEE Multimedia 12(4), 76–86 (2005)

    Article  Google Scholar 

  16. Nevatia, R., Zhao, T., Hongeng, S.: Hierarchical language-based representation of events in video streaming. In: Computer Vision and Pattern Recognition Workshop (2003)

    Google Scholar 

  17. Ryoo, M.S., Aggarwal, J.K.: Semantic representation and recognition of continued and recursive human activities. International Journal of Computer Vision 82, 1–24 (2009)

    Article  Google Scholar 

  18. Yamamoto, M., Mitomi, H., Fujiwara, F., Tlikeato: Bayesian classification of task oriental actions based on stochastic context-free grammar. In: Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition, USA, pp. 317–323 (2006)

    Google Scholar 

  19. Minnen, D., Essa, I., Starner, T.: Expectation grammars: leveraging high-level expectation for activity recognition. In: Computer Vision and Pattern Recognition, vol. 2, pp. 626–632 (2003)

    Google Scholar 

  20. Zhang, Z., Huang, K., Tan, T.: Multi-thread Parsing for Recognizing Complex Events in Videos. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 738–751. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  21. Ogale, A., Karapurkar, A., Aloimonos, Y.: View-invariant modeling and recognition of human actions using grammars. In: IEEE Workshop on Dynamical Vision (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag GmbH Berlin Heidelberg

About this chapter

Cite this chapter

Xu, W., Zhang, Y., Lu, J., Wang, J. (2012). A Grammar Based Method for Video Event Indexing and Retrieval. In: Luo, J. (eds) Affective Computing and Intelligent Interaction. Advances in Intelligent and Soft Computing, vol 137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27866-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27866-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27865-5

  • Online ISBN: 978-3-642-27866-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics