(Mis?)-Using DRT for generation of natural language text from image sequences

Gerber, Ralf; Nagel, Hans -Hellmut

doi:10.1007/BFb0054746

Ralf Gerber¹ &
Hans -Hellmut Nagel^1,2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1407))

Included in the following conference series:

European Conference on Computer Vision

214 Accesses
6 Citations

Abstract

The abundance of geometric results from image sequence evaluation which is expected to shortly become available creates a new problem: how to present this material to a user without inundating him with unwanted details? A system design which attempts to cope not only with image sequence evaluation, but in addition with an increasing number of abstraction steps required for efficient presentation and inspection of results, appears to become necessary. The system-user interaction of a Computer Vision system should thus be designed as a natural language dialogue, assigned within the overall system at what we call the ‘Natural Language Level’. Such a decision requires to construct a series of abstraction steps from geometric evaluation results to natural language text describing the contents of an image sequence. We suggest to use Discourse Representation Theory as developed by [14] in order to design the system-internal representation of knowledge and results at the Natural Language Level. A first implementation of this approach and results obtained applying it to image sequences recorded from real world traffic scenes are described.

Download to read the full chapter text

Chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

A. Abella and J.R. Kender: Description Generation of Abnormal Densities Found in Radiographs. Proc. Workshop on Conceptual Descriptions from Images, Cambridge/UK, 19 April 1996, H. Buxton (Ed.), pp. 97–111.
Google Scholar
E. Andrè, G. Herzog, and T. Rist: The System Soccer. Proc. of the 8th European Conference on Artificial Intelligence, Munich/Germany, 1–5 August 1988, pp. 449–454.
Google Scholar
D.S. Bloomberg and F.R. Chen: Document Image Summarization without OCR. Proc. IEEE International Conference on Image Processing (ICIP '96), Lausanne/CH, 16–19 September 1996, Vol. II, pp. 229–232.
Google Scholar
H. Buxton and S. Gong: Visual Surveillance in a Dynamic and Uncertain World. Artificial Intelligence 78 (1995) 431–459.
Article Google Scholar
S. Dance, T. Caelli, and Z.-Q. Liu: Picture Interpretation: A Symbolic Approach. Series in Machine Perception and Artificial Intelligence Vol. 20, World Scientific, Singapore a. o. 1995.
Google Scholar
S. Dance, T. Caelli, and Z.-Q. Liu: A Concurrent, Hierarchical Approach to Symbolic Scene Interpretation. Pattern Recognition 29:11 (1996) 1891–1903.
Article Google Scholar
L. Friedman: From Images to Language. Proc. Workshop on Conceptual Descriptions from Images, Cambridge/UK, 19 April 1996, H. Buxton (Ed.), pp. 70–81.
Google Scholar
R. Gerber and H.-H. Nagel: Berechnung natürlichsprachlicher Beschreibungen von StraΒenverkehrsszenen aus Bildfolgen unter Verwendung von Geschehens-und Verdeckungsmodellierung. In B. JÄhne, P. Gei\ler, H. Hau\ecker und F. Hering (Hrsg.), Mustererkennung 1996; 18. DAGM-Symposium, Heidelberg/Germany, 11.–13. September 1996, pp. 601–608 (in German).
Google Scholar
R. Gerber and H.-H. Nagel: Knowledge Representation for the Generation of Quantified Natural Language Descriptions of Vehicle Traffic in Image Sequences. Proc. IEEE International Conference on Image Processing (ICIP '96), Lausanne/CH, 16–19 September 1996, Vol. II, pp. 805–808.
Google Scholar
M. Haag, H.-H. Nagel: Beginning a Transition from a Local to a More Global Point of View in Model-Based Vehicle Tracking. H Burkhardt, B. Neumann (Eds.): Proc. European Conference on Computer Vision 1998 (ECCV '98), Freiburg/Germany, 2–6 June 1998.
Google Scholar
M. Haag, W. Theilmann, K.H. SchÄfer, and H.-H. Nagel: Integration of Image Sequence Evaluation and Fuzzy Metric Temporal Logic Programming. KI-97: Advances in Artificial Intelligence, Proc. 21st Annual German Conference on Artificial Intelligence, Freiburg/Germany, 9–12 September 1997; G. Brewka, C. Habel, and B. Nebel (Eds.): Lecture Notes in Artificial Intelligence vol. 1303, Springer-Verlag Berlin, Heidelberg, New York 1997, pp. 301–312.
Google Scholar
G. Herzog and P. Wazinski: Visual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review Journal 8 (1994) 175–187.
Article Google Scholar
T. Huang, D. Koller, J. Malik, G. Ogasawara, B. Rao, S. Russell, and J. Weber: Automatic Symbolic Traffic Scene Analysis Using Belief Networks. Proc. 12th National Conference on Artificial Intelligence, Seattle/WA, 31 July–4 August 1994, pp. 966–972.
Google Scholar
H. Kamp and U. Reyle: From Discourse to Logic. Kluwer Academic Publishers, Dordrecht/NL, Boston/MA, London/UK 1993.
Google Scholar
H. Kollnig und H.-H. Nagel: Ermittlung von begrifflichen Beschreibungen von Geschehen in Stra\enverkehrsszenen mit Hilfe unscharfer Mengen. Informatik — Forschung und Entwicklung 8 (1993) 186–196 (in German).
Google Scholar
H. Kollnig and H.-H. Nagel: 3D Pose Estimation by Directly Matching Polyhedral Models to Gray Value Gradients. International Journal of Computer Vision 23:3 (1997) 283–302.
Article Google Scholar
H.-H. Nagel, H. Kollnig, M. Haag, and H. Damm: The Association of Situation Graphs with Temporal Variations in Image Sequences. Working Notes AAAI-95 Fall Symposium Series ‘Computational Models for Integrating Language and Vision', R.K. Srihari (ed.), Cambridge/MA, 10–12 November 1995, pp. 1–8.
Google Scholar
B. Neumann und H.-J. Novak: NAOS: Ein System zur natürlichsprachlichen Beschreibung zeitverÄnderlicher Szenen. Informatik — Forschung Entwicklung 1 (1986) 83–92 (in German).
Google Scholar
S. Satoh, Y. Nakamura, and T. Kanade: Name-It: Naming and Detecting Faces in Video by the Integration of Image and Natural Language Processing. Proc. 15th International Joint Conference on Artificial Intelligence (IJCAI '97), 23–29 August 1997, Nagoya/Japan, Vol. II, pp. 1488–1493.
Google Scholar
K.H. SchÄfer: Unscharfe zeitlogische Modellierung von Situationen und Handlungen in Bildfolgenauswertung und Robotik. Dissertation, FakultÄt für Informatik der UniversitÄt Karlsruhe (TH), Juli 1996. Published in: Dissertationen zur Künstlichen Intelligenz (DISKI), Band 135, infix-Verlag St. Augustin 1996 (in German).
Google Scholar
M.A. Smith and T. Kanade: Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR '97), 17–19 June 1997, San Juan, Puerto Rico, pp. 775–781.
Google Scholar
R.K. Srihari: Linguistic Context in Vision. Proc. IEEE Workshop on Context-Based Vision, Cambridge/MA, 19 June 1995, pp. 100–110.
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Algorithmen und Kognitive Systeme, Fakultät für Informatik der Universität Karlsruhe (TH), Postfach 6980, D-76128, Karlsruhe, Germany
Ralf Gerber & Hans -Hellmut Nagel
Fraunhofer-Institut für Informations- und Datenverarbeitung (IITB), Fraunhoferstr. 1, D-76131, Karlsruhe, Germany
Hans -Hellmut Nagel

Authors

Ralf Gerber
View author publications
You can also search for this author in PubMed Google Scholar
Hans -Hellmut Nagel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hans Burkhardt Bernd Neumann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gerber, R., Nagel, H.H. (1998). (Mis?)-Using DRT for generation of natural language text from image sequences. In: Burkhardt, H., Neumann, B. (eds) Computer Vision — ECCV’98. ECCV 1998. Lecture Notes in Computer Science, vol 1407. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054746

Download citation

DOI: https://doi.org/10.1007/BFb0054746
Published: 26 May 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64613-6
Online ISBN: 978-3-540-69235-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics