Abstract
The abundance of geometric results from image sequence evaluation which is expected to shortly become available creates a new problem: how to present this material to a user without inundating him with unwanted details? A system design which attempts to cope not only with image sequence evaluation, but in addition with an increasing number of abstraction steps required for efficient presentation and inspection of results, appears to become necessary. The system-user interaction of a Computer Vision system should thus be designed as a natural language dialogue, assigned within the overall system at what we call the ‘Natural Language Level’. Such a decision requires to construct a series of abstraction steps from geometric evaluation results to natural language text describing the contents of an image sequence. We suggest to use Discourse Representation Theory as developed by [14] in order to design the system-internal representation of knowledge and results at the Natural Language Level. A first implementation of this approach and results obtained applying it to image sequences recorded from real world traffic scenes are described.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
A. Abella and J.R. Kender: Description Generation of Abnormal Densities Found in Radiographs. Proc. Workshop on Conceptual Descriptions from Images, Cambridge/UK, 19 April 1996, H. Buxton (Ed.), pp. 97–111.
E. Andrè, G. Herzog, and T. Rist: The System Soccer. Proc. of the 8th European Conference on Artificial Intelligence, Munich/Germany, 1–5 August 1988, pp. 449–454.
D.S. Bloomberg and F.R. Chen: Document Image Summarization without OCR. Proc. IEEE International Conference on Image Processing (ICIP '96), Lausanne/CH, 16–19 September 1996, Vol. II, pp. 229–232.
H. Buxton and S. Gong: Visual Surveillance in a Dynamic and Uncertain World. Artificial Intelligence 78 (1995) 431–459.
S. Dance, T. Caelli, and Z.-Q. Liu: Picture Interpretation: A Symbolic Approach. Series in Machine Perception and Artificial Intelligence Vol. 20, World Scientific, Singapore a. o. 1995.
S. Dance, T. Caelli, and Z.-Q. Liu: A Concurrent, Hierarchical Approach to Symbolic Scene Interpretation. Pattern Recognition 29:11 (1996) 1891–1903.
L. Friedman: From Images to Language. Proc. Workshop on Conceptual Descriptions from Images, Cambridge/UK, 19 April 1996, H. Buxton (Ed.), pp. 70–81.
R. Gerber and H.-H. Nagel: Berechnung natürlichsprachlicher Beschreibungen von StraΒenverkehrsszenen aus Bildfolgen unter Verwendung von Geschehens-und Verdeckungsmodellierung. In B. JÄhne, P. Gei\ler, H. Hau\ecker und F. Hering (Hrsg.), Mustererkennung 1996; 18. DAGM-Symposium, Heidelberg/Germany, 11.–13. September 1996, pp. 601–608 (in German).
R. Gerber and H.-H. Nagel: Knowledge Representation for the Generation of Quantified Natural Language Descriptions of Vehicle Traffic in Image Sequences. Proc. IEEE International Conference on Image Processing (ICIP '96), Lausanne/CH, 16–19 September 1996, Vol. II, pp. 805–808.
M. Haag, H.-H. Nagel: Beginning a Transition from a Local to a More Global Point of View in Model-Based Vehicle Tracking. H Burkhardt, B. Neumann (Eds.): Proc. European Conference on Computer Vision 1998 (ECCV '98), Freiburg/Germany, 2–6 June 1998.
M. Haag, W. Theilmann, K.H. SchÄfer, and H.-H. Nagel: Integration of Image Sequence Evaluation and Fuzzy Metric Temporal Logic Programming. KI-97: Advances in Artificial Intelligence, Proc. 21st Annual German Conference on Artificial Intelligence, Freiburg/Germany, 9–12 September 1997; G. Brewka, C. Habel, and B. Nebel (Eds.): Lecture Notes in Artificial Intelligence vol. 1303, Springer-Verlag Berlin, Heidelberg, New York 1997, pp. 301–312.
G. Herzog and P. Wazinski: Visual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review Journal 8 (1994) 175–187.
T. Huang, D. Koller, J. Malik, G. Ogasawara, B. Rao, S. Russell, and J. Weber: Automatic Symbolic Traffic Scene Analysis Using Belief Networks. Proc. 12th National Conference on Artificial Intelligence, Seattle/WA, 31 July–4 August 1994, pp. 966–972.
H. Kamp and U. Reyle: From Discourse to Logic. Kluwer Academic Publishers, Dordrecht/NL, Boston/MA, London/UK 1993.
H. Kollnig und H.-H. Nagel: Ermittlung von begrifflichen Beschreibungen von Geschehen in Stra\enverkehrsszenen mit Hilfe unscharfer Mengen. Informatik — Forschung und Entwicklung 8 (1993) 186–196 (in German).
H. Kollnig and H.-H. Nagel: 3D Pose Estimation by Directly Matching Polyhedral Models to Gray Value Gradients. International Journal of Computer Vision 23:3 (1997) 283–302.
H.-H. Nagel, H. Kollnig, M. Haag, and H. Damm: The Association of Situation Graphs with Temporal Variations in Image Sequences. Working Notes AAAI-95 Fall Symposium Series ‘Computational Models for Integrating Language and Vision', R.K. Srihari (ed.), Cambridge/MA, 10–12 November 1995, pp. 1–8.
B. Neumann und H.-J. Novak: NAOS: Ein System zur natürlichsprachlichen Beschreibung zeitverÄnderlicher Szenen. Informatik — Forschung Entwicklung 1 (1986) 83–92 (in German).
S. Satoh, Y. Nakamura, and T. Kanade: Name-It: Naming and Detecting Faces in Video by the Integration of Image and Natural Language Processing. Proc. 15th International Joint Conference on Artificial Intelligence (IJCAI '97), 23–29 August 1997, Nagoya/Japan, Vol. II, pp. 1488–1493.
K.H. SchÄfer: Unscharfe zeitlogische Modellierung von Situationen und Handlungen in Bildfolgenauswertung und Robotik. Dissertation, FakultÄt für Informatik der UniversitÄt Karlsruhe (TH), Juli 1996. Published in: Dissertationen zur Künstlichen Intelligenz (DISKI), Band 135, infix-Verlag St. Augustin 1996 (in German).
M.A. Smith and T. Kanade: Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR '97), 17–19 June 1997, San Juan, Puerto Rico, pp. 775–781.
R.K. Srihari: Linguistic Context in Vision. Proc. IEEE Workshop on Context-Based Vision, Cambridge/MA, 19 June 1995, pp. 100–110.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gerber, R., Nagel, H.H. (1998). (Mis?)-Using DRT for generation of natural language text from image sequences. In: Burkhardt, H., Neumann, B. (eds) Computer Vision — ECCV’98. ECCV 1998. Lecture Notes in Computer Science, vol 1407. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054746
Download citation
DOI: https://doi.org/10.1007/BFb0054746
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64613-6
Online ISBN: 978-3-540-69235-5
eBook Packages: Springer Book Archive