Skip to main content

Memory and Expectations in Learning, Language, and Visual Understanding

  • Chapter
Book cover Integration of Natural Language and Vision Processing

Abstract

Research in vision and language has traditionally remained separate in part because the classic task of generating a representation of a given image or sentence has resulted in an emphasis on low level structural aspects of these media. In this paper we argue that image and language understanding should be approached with the intent of facilitating the performance of a task. Under this view research in image and language understanding must confront common issues that arise as a task is pursued. Language and images are both input that can be used to maintain a model of a task. We argue that a model may be maintained by incorporating changes in the scene that can be characterized at a high level of abstraction yet manifest themselves at relatively low levels of analysis. Existing task-relevant models and the associated domain knowledge are used to expect specific changes and disambiguate the interpretation of these changes, thereby allowing them to modify the existing model. From this perspective, understanding input is largely independent of the modality of the input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aloimonos, J., Bandapadhay, A. & Weiss, I. (1987). Active Vision. In Proceedings of The First International Conference on Computer Vision.

    Google Scholar 

  • Bajscy, R. (1988). Active Perception. In Proceedings of The IEEE. 76: 996–1005.

    Google Scholar 

  • Ballard, D. H. (1991). Animate Vision. Artificial Intelligence 48(1): 57–86.

    Article  MathSciNet  Google Scholar 

  • Berwick, R. C, Abnewy, S. P. & Tenny, C. (eds.) (1991). Principle-Based Parsing: Computation and Psycholinguistics. Kluwer: Dordrecht.

    Google Scholar 

  • Birnbaum, L., Brand, M. & Cooper P. (1993). Looking for Trouble: Using Causal Semantics to Direct Focus of Attention. In Proceedings of The Fourth International Conference on Computer Vision ICCV’93, Berlin, Germany.

    Google Scholar 

  • Charniak, E. & McDermott, D. (1985). Introduction to Artificial Intelligence, 89. Addison-Wesley: Reading, MA.

    Google Scholar 

  • Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press: Cambridge, MA.

    Google Scholar 

  • Coombs, D. J. & Brown, C. M. (1992). Intelligent Gaze Control in Binocular Vision. Department of Computer Science. University of Rochester.

    Google Scholar 

  • Fano, A. & Cooper, P. (1994). Maintaining Visual Models of a Scene Using Change Primitives. In Proceedings of The Computer Vision and Pattern Recognition Conference, Seattle.

    Google Scholar 

  • Ferguson, W., Bareiss, R., Birnbaum, L. & Osgood, R. (1992). Ask Systems: An Approach to the Realization of Story-Based Teachers. Technical Report #22, The Institute for the Learning Sciences, Northwestern University, Evanston, IL.

    Google Scholar 

  • Marcus, M. P. (1980). A Theory of Syntactic Recognition for Natural Language. MIT Press: Cambridge, MA.

    MATH  Google Scholar 

  • Papert, S. (1980). Mindstorms: Children, Computers, and Powerful Ideas. Basic Books: New York.

    Google Scholar 

  • Poggio, T., Torre, V. & Koch, C. (1987). Computational Vision and Regularization theory. In Fischler, M. & Firschein, O. (eds.), Readings In Computer Vision. Morgan Kaufman: Los Altos, CA.

    Google Scholar 

  • Prokopowicz, P. & Cooper, P. (1993) The Dynamic Retina: Contrast and Motion Detection for Active Vision. Forthcoming Technical Report. The Institute for the Learning Sciences. Northwestern University.

    Google Scholar 

  • Riesbeck, C. & Martin, C. E. (1985). Direct Memory Access Parsing. Technical Report #354. Department of Computer Science, Yale University.

    Google Scholar 

  • Schank, R. (1977) Rules and Topics in Conversation. Cognitive Science 1: 421–441.

    Article  Google Scholar 

  • Schank, R. (1982). Dynamic Memory. Cambridge University Press: Cambridge.

    Google Scholar 

  • Schank, R., Fano, A., Bell, B. & Jona, M. The Design of Goal-Based Scenarios. The Journal of the Learning Sciences 3(4).

    Google Scholar 

  • Swain, M. J. (1990). Color Indexing. Technical Report #360. Department of Computer Science. University of Rochester.

    Google Scholar 

  • Tomita, M. (1986). Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Kluwer: Boston.

    Google Scholar 

  • Whitehead, A. N. (1929). The Aims of Education. Macmillan: New York.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Kluwer Academic Publishers

About this chapter

Cite this chapter

Schank, R.C., Fano, A. (1995). Memory and Expectations in Learning, Language, and Visual Understanding. In: Mc Kevitt, P. (eds) Integration of Natural Language and Vision Processing. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-1639-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-94-009-1639-5_3

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-0-7923-3944-1

  • Online ISBN: 978-94-009-1639-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics