Skip to main content

A Self-Referential Perceptual Inference Framework for Video Interpretation

  • Conference paper
  • First Online:
Computer Vision Systems (ICVS 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2626))

Included in the following conference series:

Abstract

This paper presents an extensible architectural model for general content-based analysis and indexing of video data which can be customised for a given problem domain. Video interpretation is approached as a joint inference problems which can be solved through the use of modern machine learning and probabilistic inference techniques. An important aspect of the work concerns the use of a novel active knowledge representation methodology based on an ontological query language. This representation allows one to pose the problem of video analysis in terms of queries expressed in a visual language incorporating prior hierarchical knowledge of the syntactic and semantic structure of entities, relationships, and events of interest occurring in a video sequence. Perceptual inference then takes place within an ontological domain defined by the structure of the problem and the current goal set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Addlesee, R. Curwen, S. Hodges, J. Newman, P. Steggles, A. Ward, and A. Hopper. Implementing a sentient computing system. IEEE Computer, 34(8):50–56, 2001.

    Google Scholar 

  2. K. Barnard and D. Forsyth. Learning the semantics of words and pictures. In Proc. International Conference on Computer Vision, 2001.

    Google Scholar 

  3. A. Bobick and Y. Ivanov. Action recognition using probabilistic parsing. In Proc. Conference on Computer Vision and Pattern Recognition, 1998.

    Google Scholar 

  4. H. Bunke and D. Pasche. Structural Pattern Analysis, chapter Parsing multivalued strings and its application to image and waveform recognition. World Scientific Publishing, 1990.

    Google Scholar 

  5. H. Buxton and S. Gong. Advanced visual surveillance using bayesian networks. In Proc. International Conference on Computer Vision, 1995.

    Google Scholar 

  6. H. Buxton and N. Walker. Query based visual analysis: Spatio-temporal reasoning in computer vision. Vision Computing, 6(4):247–254, 1988.

    Article  Google Scholar 

  7. Y. Chen, Y. Rui, and T. Huang. JPDAF based HMM for real-time contour tracking. In Proc. Conference on Computer Vision and Pattern Recognition, 2001.

    Google Scholar 

  8. J. Crowley, J. Coutaz, and F. Berard. Things that see: Machine perception for human computer interaction. Communications of the ACM, 43(3):54–64, 2000.

    Article  Google Scholar 

  9. J. Crowley, J. Coutaz, G. Rey, and P. Reignier. Perceptual components for context aware computing. In Proc. Ubicomp 2002, 2002.

    Google Scholar 

  10. J. Crowley and Y. Demazeau. Principles and techniques for sensor data fusion. Signal Processing, 32(1–2):5–27, 1993.

    Article  Google Scholar 

  11. T. Darrell, G. Gordon, M. Harville, and J. Woodfill. Integrated person tracking using stereo, color, and pattern detection. In Proc. Conference on Computer Vision and Pattern Recognition, 1998.

    Google Scholar 

  12. D. C. Dennett. Minds, machines, and evolution, chapter Cognitive Wheels: The Frame Problem of AI, pages 129–151. Cambridge University Press, 1984.

    Google Scholar 

  13. B. Draper, U. Ahlrichs, and D. Paulus. Adapting object recognition across domains: A demonstration. Lecture Notes in Computer Science, 2095:256–270, 2001.

    Google Scholar 

  14. P. Duygulu, K. Barnard, J.F.H. De Freitas, and D.A. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proc. European Conference on Computer Vision, 2002.

    Google Scholar 

  15. J. Glicksman. A cooperative scheme for image understanding using multiple sources of information. Technical Report TR-82-13, University of British Columbia, Department of Computer Science, 1982.

    Google Scholar 

  16. S. Harnad. The symbol grounding problem. Physica D, 42:335–346, 1990.

    Article  Google Scholar 

  17. A. Harter, A. Hopper, P. Steggles, A. Ward, and P. Webster. The anatomy of a context-aware application. In Mobile Computing and Networking, pages 59–68, 1999.

    Google Scholar 

  18. G. Herzog and K. Rohr. Integrating vision and language: Towards automatic description of human movements. In I. Wachsmuth, C.-R. Rollinger, and W. Brauer, editors, KI-95: Advances in Artificial Intelligence. 19th Annual German Conference on Artificial Intelligence, pages 257–268. Springer, 1995.

    Google Scholar 

  19. S. Intille and A. Bobick. Representation and visual recognition of complex, multiagent actions using belief networks. In IEEE Workshop on the Interpretation of Visual Motion, 1998.

    Google Scholar 

  20. M. Isard and A. Blake. ICONDENSATION: Unifying low-level and high-level tracking in a stochastic framework. Lecture Notes in Computer Science, 1406, 1998.

    Google Scholar 

  21. Y. Ivanov and A. Bobick. Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. on Pattern Analysis and Machine Intell., 22(8), 2000.

    Google Scholar 

  22. A. Jaimes and S. Chang. A conceptual framework for indexing visual information at multiple levels. In IS&T SPIE Internet Imaging, 2000.

    Google Scholar 

  23. F.V. Jensen. An Introduction to Bayesian Networks. Springer Verlag, 1996.

    Google Scholar 

  24. A. Kojima, T. Tamura, and K. Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. Int. Journal of Computer Vision (to appear), 2002.

    Google Scholar 

  25. D. Moore and I. Essa. Recognizing multitasked activities using stochastic context-free grammar. In Proc. Workshop on Models vs Exemplars in Computer Vision, 2001.

    Google Scholar 

  26. N. Oliver, B. Rosario, and A. Pentland. A bayesian computer vision system for modeling human interactions. IEEE Trans. on Pattern Analysis and Machine Intell., 22(8):831–843, 2000.

    Article  Google Scholar 

  27. C. Pinhanez and A. Bobick. Approximate world models: Incorporating qualitative and linguistic information into vision systems. In AAAI’96, 1996.

    Google Scholar 

  28. R. Rimey. Control of Selective Perception using Bayes Nets and Decision Theory. PhD thesis, University of Rochester Computer Science Department, 1993.

    Google Scholar 

  29. J. Sherrah and S. Gong. Tracking discontinuous motion using bayesian inference. In Proc. European Conference on Computer Vision, pages 150–166, 2000.

    Google Scholar 

  30. J. Sherrah and S. Gong. Continuous global evidence-based bayesian modality fusion for simultaneous tracking of multiple objects. In Proc. International Conference on Computer Vision, 2001.

    Google Scholar 

  31. P. Smith. Edge-based Motion Segmentation. PhD thesis, Cambridge University Engineering Department, 2001.

    Google Scholar 

  32. K. Sparck Jones. Information retrieval and artificial intelligence. Artificial Intelligence, 114: 257–281, 1999.

    Article  MATH  Google Scholar 

  33. M. Spengler and B. Schiele. Towards robust multi-cue integration for visual tracking. Lecture Notes in Computer Science, 2095:93–106, 2001.

    Article  Google Scholar 

  34. R. Srihari. Computational models for integrating linguistic and visual information: A survey. Artificial Intelligence Review, special issue on Integrating Language and Vision, 8:349–369, 1995.

    Google Scholar 

  35. S. Stillman and I. Essa. Towards reliable multimodal sensing in aware environments. In Proc. Perceptual User Interfaces Workshop, ACM UIST 2001, 2001.

    Google Scholar 

  36. M. Thonnat and N. Rota. Image understanding for visual surveillance applications. In Proc. of 3rd Int. Workshop on Cooperative Distributed Vision, 1999.

    Google Scholar 

  37. C.P. Town and D.A. Sinclair. Ontological query language for content based image retrieval. In Proc. IEEE Workshop on Content-based Access of Image and Video Libraries, pages 75–81, 2001.

    Google Scholar 

  38. K. Toyama and E. Horvitz. Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In Proc. Asian Conference on Computer Vision, 2000.

    Google Scholar 

  39. W. Tsai and K. Fu. Attributed grammars — a tool for combining syntactic and statistical approaches to pattern recognition. IEEE Transactions on Systems, Man and Cybernetics, SMC-10(12), 1980.

    Google Scholar 

  40. J. Tsotsos, J. Mylopoulos, H. Covvey, and S. Zucker. A framework for visual motion understanding. IEEE Trans. on Pattern Analysis and Machine Intell., Special Issue on Computer Analysis of Time-Varying Imagery:563–573, 1980.

    Google Scholar 

  41. Y. Wu and T. Huang. A co-inference approach to robust visual tracking. In Proc. International Conference on Computer Vision, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Town, C., Sinclair, D. (2003). A Self-Referential Perceptual Inference Framework for Video Interpretation. In: Crowley, J.L., Piater, J.H., Vincze, M., Paletta, L. (eds) Computer Vision Systems. ICVS 2003. Lecture Notes in Computer Science, vol 2626. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36592-3_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-36592-3_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00921-4

  • Online ISBN: 978-3-540-36592-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics