Content-Based Retrieval of Functional Objects in Video Using Scene Context
Functional object recognition in video is an emerging problem for visual surveillance and video understanding problem. By functional objects, we mean objects with specific purpose such as postman and delivery truck, which are defined more by their actions and behaviors than by appearance. In this work, we present an approach for content-based learning and recognition of the function of moving objects given video-derived tracks. In particular, we show that semantic behaviors of movers can be captured in location-independent manner by attributing them with features which encode their relations and actions w.r.t. scene contexts. By scene context, we mean local scene regions with different functionalities such as doorways and parking spots which moving objects often interact with. Based on these representations, functional models are learned from examples and novel instances are identified from unseen data afterwards. Furthermore, recognition in the presence of track fragmentation, due to imperfect tracking, is addressed by a boosting-based track linking classifier. Our experimental results highlight both promising and practical aspects of our approach.
KeywordsFunctional Model Defense Advance Research Project Agency Functional Object Defense Advance Research Project Agency Scene Context
Unable to display preview. Download preview PDF.
- 2.Stark, L., Bowyer, K.: Achieving Generalized Object Recognition through Reasoning about Association of Function to Structure. PAMI 13, 1097–1104 (1991)Google Scholar
- 3.Peursum, P., West, G., Venkatesh, S.: Combining image regions and human activity for indirect object recognition in indoor wide-angle video. In: ICCV (2005)Google Scholar
- 4.Gupta, A., Davis, L.: Objects in Action: An Approach for Combining Action Understanding and Object Perception. In: CVPR (2007)Google Scholar
- 5.Junejo, I., Javed, O., Shah, M.: Multi feature path modeling for video surveillance. In: ICPR (2004)Google Scholar
- 6.Wang, X., Ma, K.T., Ng, G.W., Grimson, E.: Trajectory analysis and semantic region modeling using a nonparametric Bayesian model. In: CVPR (2008)Google Scholar
- 7.Yang, Y., Liu, J., Shah, M.: Video scene understanding using multi-scale analysis. In: ICCV (2009)Google Scholar
- 8.Chan, M., Hoogs, A., Schmiederer, J., Petersen, M.: Detecting rare events in video using semantic primitives with HMM. In: CVPR (2006)Google Scholar
- 9.Perera, A., Srinivas, C., Hoogs, A., Brooksby, G., Hu, W.: Multi-object tracking through simultaneous long occlusions and split-merge conditions. In: CVPR (2006)Google Scholar
- 10.Li, Y., Huang, C., Nevatia, R.: Learning to associate: Hybridboosted multi-target tracker for crowded scene. In: CVPR (2009)Google Scholar
- 11.Turek, M.W., Hoogs, A., Collins, R.: Unsupervised learning of functional categories in video scenes. In: ECCV (2010)Google Scholar
- 12.Swears, E., Hoogs, A.: Functional scene element recognition for video scene analysis. In: IEEE Workshop on Motion and Video Computing (2009)Google Scholar
- 13.Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: NIPS (2006)Google Scholar