Abstract
Although image understanding and natural language processing constitute two major areas of AI, they have mostly been studied independently of each other. Only a few attempts have been concerned with the integration of computer vision and the generation of natural language expressions for the description of image sequences.
The aim of our joint efforts at combining a vision system and a natural language access system is the automatic simultaneous description of dynamic imagery, i.e., we are interested in image interpretation and language processing on an incremental basis. In this contribution1 we sketch an approach towards the integration of the Karlsruhe vision system called ACTIONS and the natural language component VITRA developed in Saarbrücken. The steps toward realization, based on available components, are outlined and the capabilities of the current system are demonstrated.
Zusammenfassung
Obwohl das Bildverstehen und die Verarbeitung natürlicher Sprache zwei der Kerngebiete im Bereich der KI darstellen, wurden sie bisher nahezu unabhängig voneinander untersucht. Nur sehr wenige Ansätze haben sich mit der Intergration von maschinellem Sehen und der Generierung natürlichsprachlicher Äußerungen zur Beschreibung von Bildfolgen beschäftigt.
Das Ziel unserer Zusammenarbeit bei der Kopplung eines bildverstehenden Systems und eines natürlichsprachlichen Zugangssystems ist die automatische simultane Beschreibung zeitveränderlicher Szenen, d.h. wir sind interessiert an Bildfolgeninterpretation und Sprachverarbeitung auf inkrementeller Basis. In diesem Beitrag beschreiben wir einen Ansatz zur Integration des Karlsruher Bildfolgenanalysesystems Actions und der natürlichsprachlichen Komponente Vitra, die in Saarbrücken entwickelt wird. Die Schritte hin zur Realisierung, basierend auf bereits verfügbaren Komponenten, werden dargestellt und die Fähigkeiten des derzeit vorhandenen Systems demonstriert.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J.F. Allen. Towards a General Theory of Action and Time. Artificial Intelligence, 23 (2): 123–154, 1984.
E. André, G. Bosch, G. Herzog and T. Rist. Characterizing Trajectories of Moving Objects Using Natural Language Path Descriptions. In: Proc. of 7th ECAI, Vol. 2, pp. 1–8, Brighton, UK, 1986.
E. André, G. Herzog and T. Rist. On the Simultaneous Interpretation of Real World Image Sequences and their Natural Language Description: The System SOCCER. In: Proc. of 8th ECAI, pp. 449–454, Munich, 1988.
E. André, T. Rist and G. Herzog. Generierung natürlichsprachlicher Äußerungen zur simultanen Beschreibung zeitveränderlicher Szenen. In: K. Morik (ed.), GWAI–87, llth German Workshop on AI, pp. 330 - 337, Berlin: Springer, 1987.
N. Ayache and O.D. Faugeras. Building, Registrating, and Fusing Noisy Visual Maps. In: Proc. of First International Conference on Computer Vision, pp. 73–82, London, 1987.
N.J. Badler. Temporal Scene Analysis: Conceptual Description of Object Movements. Technical Report 80, Computer Science Department, University of Toronto, 1975.
R. Bajcsy, A. Joshi, E. Krotkov and A. Zwarico. LandScan: A Natural Language and Computer Vision System for Analyzing Aerial Images. In: Proc. of 9th IJCAI, pp. 919–921, Los Angeles, 1985.
S. Busemann. Surface Transformations during the Generation of Written German Sentences. In: L. Bole (ed.), Natural Language Generation Systems, Berlin: Springer, 1984.
O.D. Faugeras. A Few Steps toward Artificial 3D Vision. Report 790, Institut National de Recherche en Informatique et en Automatique INRIA, Domarne de Voluceau, Rocquencourt, Le Chesnay, France, 1988.
N.H. Goddard. Recognizing Animal Motion. In: Proc. of Image Understanding Workshop, pp. 938–944, San Mateo, CA, 1988.
H.P. Grice. Logic and Conversation. In: P. Cole and J.L. Morgan (eds.), Speech Acts, pp. 41 - 58, London: Academic Press, 1975.
T. Kanade. Region Segmentation: Signal versus Semantics. Computer Graphics and Image Process¬ing, 13: 279–297, 1980.
G. Kempen and E. Hoenkanip. An Incremental Procedural Grammar for Sentence Formulation. Cognitive Science, ll(2): 201–258, 1987.
R. Kories and G. Zimmermann. A Versatile Method for the Estimation of Displacement Vector Fields from Image Sequences. In: Proc. of Workshop on Motion: Representation and Analysis, pp. 101–106, Kiawah Island, Island Resort, Charleston, SC, 1986.
H.-H. Nagel. From Image Sequences Towards Conceptual Descriptions. Image and Vision Computing, 6 (2): 59–74, 1988.
H.-H. Nagel. Image Sequences - Ten (Octal) Years - From Phenomenology towards a Theoretical Foundation. International Journal of Pattern Recognition and Artificial Intelligence, 2: 495–483, 1988.
B. Neumann. Natural Language Description of Time-Varying Scenes. Report 105, Fachbereich Infor-matik, Universität Hamburg, 1984.
H. Niemann, H. Bunke, I. Hofmann, G. Sagerer, F. Wolf and H. Feistel. A Knowledge Based System for Analysis of Gated Blood Pool Studies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7: 246 - 259, 1985.
H.-J. Novak. Generating a Coherent Text Describing a Traffic Scene. In: Proc. of llth COLING, pp. 570–575, Bonn, 1986.
N. Okada. SUPP: Understanding Moving Picture Patterns Based on Linguistic Knowledge. In: Proc. of 6th IJCAI, pp. 690–692, Tokyo, 1979.
G. Retz-Schmidt. A REPLAI of SOCCER: Recognizing Intentions in the Domain of Soccer Games. In: Proc. of 8th ECAI, pp. 455–457, Munich, 1988.
J.R.J. Schirra, G. Bosch, C.K. Sung and G. Zimmermann. From Image Sequences to Natural Language: A First Step towards Automatic Perception and Description of Motions. Applied Artificial Intelligence, 1: 287–305, 1987.
C.-K. Sung. Extraktion von typischen und komplexen Vorgängen aus einer langen Bildfolge einer Verkehrsszene. In: H. Bunke, 0. Kubier, and P. Stucki (eds.), Mustererkennung 1988, Informatik Fachberichte, Vol. 180, pp. 90–96, Berlin: Springer, 1988.
C.-K. Sung and G. Zimmermann. Detektion und Verfolgung mehrerer Objekte in Bildfolgen. In: G. Hartmann (ed.), Mustererkennung 1986, Informatik Fachberichte, Vol. 125, pp. 181–184, Berlin: Springer, 1986.
J.K. Tsotsos. Knowledge Organization and its Role in Representation and Interpretation for Time- Varying Data: the ALVEN System. Computational Intelligence, 1: 16–32, 1985.
W. Wahlster, H. Marburger, A. Jameson and S. Busemann. Over-answering Yes-No Questions: Extended Responses in a NL Interface to a Vision System. In: Proc. of 8th IJCAI, pp. 643–646, Karlsruhe, 1983.
I. Walter, P.C. Lockemann and H.-H. Nagel. Database Support for Knowledge-Based Image Evalu-ation. In: Proc. of 13th Conference on Very Large Databases, pp. 3–11, Brighton, UK, 1988.
A. Witkin, M. Kass, D. Terzopoulos and K. Fleischer. Physically Based Modeling for Vision and Graphics. In: Proc. of Image Understanding Workshop, pp. 254–278, San Mateo, CA, 1988.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1989 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Herzog, G. et al. (1989). Incremental Natural Language Description of Dynamic Imagery. In: Brauer, W., Freksa, C. (eds) Wissensbasierte Systeme. Informatik-Fachberichte, vol 227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-75182-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-75182-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-51838-9
Online ISBN: 978-3-642-75182-0
eBook Packages: Springer Book Archive