Abstract
In this paper, we present a novel framework for indexing and retrieving video events. The framework has a hierarchical structural and based upon a grammatical model. We firstly define and detect some event primitives, and induce the parameters of the SCFG to describe the event sequences, by using Liang’s nonparametric model of HDP-SCFG and variational inference algorithm. The MES parser and viterbi algorithm is employed to acquire the parse tree, which is index and matched next. We mainly contribute in four aspects. 1) It is the first time that transplant nonparametric grammar, ISCFG, to the domain of video event retrieval. 2) The cross data link table structure of indexing is efficient enough for searching in real time. The computational complexity of locating a non-terminal at any time point is O(log). 3) The novel matching scheme allows people to search content of interest by input a sample of video segment. The measure of similarity works well because both the structure and content of the parse trees are took into account.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Xu, W.G., Lu, J.J., et al.: A unsupervised framework of video event analysis. To appear in Proceedings of the 2010 Second Pacific-Asica Conference on Knowledge Engineering and Software Engineering, Chongqing, China (December 2010)
Ivanov, Y., Bobick, A.: Recognition of visual activities and interactions by stochastic parsing. IEEE Transaction on Pattern Analysis and Machine Intelligence 22(8), 852–872 (2000)
Moore, D., Essa, I.: Recognizing multitasked activities from video using stochastic context-free grammar. In: National Conference on Artificial Intelligence, pp. 770–776 (2002)
Stolcke: An efficient probabilistic context-free Parsing Algorithm that computes prefix probabilities. Computational Linguistics 21(2), 164–201 (1995)
Liang, P., Jordan, M.I., Klein, D.: Probabilistic Grammars and Hierarchical Dirichlet Process. To appear in the Handbook of Applied Bayesian Analysis (2009)
Lawrence, N.D.: Variational inference in probabilistic models, PhD Thesis (2000)
Earley, J.: An efficient context-free parsing algorithm. Communication of the ACM 13(2), 94–102 (1970)
Teh, Y.W., Jordan, M.I., Beal, M., Blei, D.: Hierarchical dirichlet processes. Journal of the American Statistical Association 101, 1566–1581 (2006)
Lin, L., Gong, H., Wang, L.: Semantic event representation and recognition using syntactic attribute graph grammar. Pattern Recognition Letters 30, 180–186 (2009)
Aggarwal, J.K., Cai, Q.: Human motion analysis: a review. Computer Vision and Image Understanding 73(3), 429–440 (1999)
Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Transactions on Circuits, Systems and Video Technology 18(11), 1473–1488 (2008)
Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviours. IEEE Transaction on Systems, Man and Cybernetic 34(3), 334–352 (2004)
Morris, T., Trivedi, M.M.: A survey of vision-based trajectory learning and analysis for surveillance. IEEE Transaction on Circuits and Systems For Video Technology 18(8), 1114–1127 (2008)
Xie, L., SunDaram, H., Cambell, M.: Event mining in multimedia streams. Proceeding of the IEEE 96(4), 623–647 (2008)
Francois, A.R.J., Nevatia, R., Hobbs, J., Bolles, R.C.: Verl: An ontology framework for representing and annotating video events. IEEE Multimedia 12(4), 76–86 (2005)
Nevatia, R., Zhao, T., Hongeng, S.: Hierarchical language-based representation of events in video streaming. In: Computer Vision and Pattern Recognition Workshop (2003)
Ryoo, M.S., Aggarwal, J.K.: Semantic representation and recognition of continued and recursive human activities. International Journal of Computer Vision 82, 1–24 (2009)
Yamamoto, M., Mitomi, H., Fujiwara, F., Tlikeato: Bayesian classification of task oriental actions based on stochastic context-free grammar. In: Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition, USA, pp. 317–323 (2006)
Minnen, D., Essa, I., Starner, T.: Expectation grammars: leveraging high-level expectation for activity recognition. In: Computer Vision and Pattern Recognition, vol. 2, pp. 626–632 (2003)
Zhang, Z., Huang, K., Tan, T.: Multi-thread Parsing for Recognizing Complex Events in Videos. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 738–751. Springer, Heidelberg (2008)
Ogale, A., Karapurkar, A., Aloimonos, Y.: View-invariant modeling and recognition of human actions using grammars. In: IEEE Workshop on Dynamical Vision (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this chapter
Cite this chapter
Xu, W., Zhang, Y., Lu, J., Wang, J. (2012). A Grammar Based Method for Video Event Indexing and Retrieval. In: Luo, J. (eds) Affective Computing and Intelligent Interaction. Advances in Intelligent and Soft Computing, vol 137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27866-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-27866-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27865-5
Online ISBN: 978-3-642-27866-2
eBook Packages: EngineeringEngineering (R0)