Abstract
This paper focuses on the knowledge-based document analysis system ANASTASIL (Analysis System to Interpret Areas in Single-sided Letters). The system identifies important conceptual parts (logical objects) within business letters, like recipient, sender or company-specific printings. Thereby, the system works completely independent of text recognition. Instead, it only utilizes geometric knowledge sources. These are: global geometric knowledge about logical object arrangements, and local geometric knowledge about formal features of logical objects (e.g. extensions, typical font sizes, etc). As a result, a document image is classified by labeling area items by corresponding logical object designators after hypothesizing and testing geometric properties of the captured physical units (layout objects). Due to this strategy, ANASTASIL can be envisioned as a key for expectation-driven further analysis of logical objects by text or graphic recognition. The system has been completely implemented and has achieved some remarkable results. It is composed of a low-level geometric analysis module for image processing tasks and a high-level geometric analysis module that performs logical labeling of layout objects. The implementation was done on a SUN 3/60 workstation in C and Common-Lisp and will be soon available in the MacIntosh environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. N. Srihari and G. W. Zack, “Document Image Analysis,” Proc. 8th ICPR, Paris, pp. 434–440, 1986.
H. Eirund and K. Kreplin, “Knowledge Based Document Classification Supporting Integrated Document Handling,” Proc. COIS 1988, Palo Alto, CA, pp. 189–196, 1988.
A. Dengel and G. Barth, “High Level Document Analysis Guided by Geometric Aspects,” Int’l J. on Pattern Recognition and AI, Vol. 2, No. 4, pp. 641–656, 1988.
G. Nagy, “Document Analysis and Optical Character Recognition,” Proc. 5th Int’l Conf. on Image Analysis and Processing, Positano, Italy, 1989.
F. Esposito et al, “Experimental Page Layout Recognition System for Office Document Automatic Classification: An Integrated Approach For Inductive Generalization,” Proc. 10th ICPR, Atlantic City, NJ, pp. 557–562, 1990.
W. Horak, “Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization,” Computer, pp. 50–60, 1985.
W. Luhn and A. Dengel, “Modellgestützte Segmentierung und Hypothesen- generierung für die Analyse von Papierdokumenten,” In H. Bunke, O. Kübler, and P. Stucki (eds.), Proc. 10th DAGM-Symposium Mustererkennung, Zürich, Informatik-Fachbericht 180, Springer-Verlag, Berlin, pp. 226–232, 1988.
N. M. Mattos, B. Mitschang, A. Dengel, and R. Bleisinger, “An Approach to Integrated Document Processing & Management,” Proc. COIS-90, Boston, MA, pp. 118–123, 1990.
ISO, International Standards Organization: Information Processing — Text and Office System Document Structures, Part 2, Office Document Architecture Draft Proposal.
A. Dengel and E. Schweizer, “Rotationswinkelbestimmung in binären Dokument-bildern,” In H. Burkardt, H. Höhne, and B. Neumann (eds.), Proc. llth DAGM- Symposium Mustererkennung, Hamburg, Informatik-Fachberichte 219, Springer- Verlag, Berlin, pp. 274–278, 1989.
J.-P. Trincklin, Conception d’un Système d’Analyse de Documents: Etude et Réalisation d’un Module d’Extraction de la Structure Physique de Documents à Support visuel, Ph.D. Dissertation, Université de France-comte, Besançon, 1984.
W. Postl, “Detection of Linear Oblique Structures and Skew Scan in Digitized Documents,” Proc. 8th ICPR, Paris 1986, p. 240.
K. Y. Wong, R. G. Casey, and F. M. Wahl, “Document Analysis System,” IBM J. Res. Dev., 26 (6) 1982.
G. Nagy and S. Seth, “Hierarchical Representation of Optically Scanned Documents,” Proc. 7th ICPR, Montreal, p. 347, 1984.
G. Nagy, S. Seth, and S. Stoddard, “Document Analysis with an Expert System,” Pattern Recognition in Practice II, Elsevier Science Publ. B. V, pp. 149–155, 1986.
E. Schweizer, Erfassung, “Justierung und Segmentierung von Dokumentstrukturen,” Diploma Thesis, CS Dept, Univ. Stuttgart, 1989.
N. M. Mattos, “Abstraction Concepts — The Basis for Data and Knowledge Modeling,” Proc. Ith Int’l Conf. on the Entity-Relationship Approach, Roma, Italy, pp. 331–350, 1988.
A. Dengel, N. M. Mattos, and B. Mitschang, “An Integrated Document Management System,” Proc. SPIE / IEEE — Applications of Artificial Intelligence VIII, Orlando, FL, pp. 368–369, 1990.
S. N. Srihari, C.-H. Wang, P. W. Palumbo, and J. J. Hull, “Recognizing Address Blocks on Mail Pieces: Specialized Tools and Problem-Solving Architecture,” AI Magazine, Vol. 8 No. 4, Winter 1987, p. 25.
J. Butz, “Untersuchung und Definition von Struktureigenschaften in Bürodokumenten, sowie deren Bewertung mit Hilfe von kontinuierlichen Bewertungsmassen,” Bachelor Thesis, CS Dept, Univ. Stuttgart, 1988.
A. Dengel and G. Barth, “ANASTASIL: A Hybrid Knowledge-based System for Document Layout Analysis,” Proc. llth IJCAI, Detroit, MI, pp. 1249–1254, 1989.
G. Shafer, A Mathematical Theory of Evidence, Princeton Univ. Press, 1976.
A. Dengel, “Automatische Visuelle Klassifikation von Dokumenten” (in German), Ph.D. Thesis, CS Dept, Univ. Stuttgart, February 1989.
A. Barr and E. A. Feigenbaum, The Handbook of Artificial Intelligence, Vol. 1, William Kaufmann Inc., Los Angeles, CA, 1981.
E. Melchinger, “Geometrische Wissenserwerbskomponente,” Bachelor Thesis, CS Dept, Univ. Stuttgart, 1988.
F. Hönes, R. Bleisinger, and A. Dengel, “Intelligent Word-based Text Recognition,” Proc. OE-90, Symposium on Advances in Intelligent Systems ( Machine Vision and System Integration ), Boston, MA, November 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Dengel, A. (1992). ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents. In: Baird, H.S., Bunke, H., Yamamoto, K. (eds) Structured Document Image Analysis. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-77281-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-77281-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-77283-2
Online ISBN: 978-3-642-77281-8
eBook Packages: Springer Book Archive