ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents

Dengel, Andreas

doi:10.1007/978-3-642-77281-8_4

Andreas Dengel⁴

456 Accesses
19 Citations

Abstract

This paper focuses on the knowledge-based document analysis system ANASTASIL (Analysis System to Interpret Areas in Single-sided Letters). The system identifies important conceptual parts (logical objects) within business letters, like recipient, sender or company-specific printings. Thereby, the system works completely independent of text recognition. Instead, it only utilizes geometric knowledge sources. These are: global geometric knowledge about logical object arrangements, and local geometric knowledge about formal features of logical objects (e.g. extensions, typical font sizes, etc). As a result, a document image is classified by labeling area items by corresponding logical object designators after hypothesizing and testing geometric properties of the captured physical units (layout objects). Due to this strategy, ANASTASIL can be envisioned as a key for expectation-driven further analysis of logical objects by text or graphic recognition. The system has been completely implemented and has achieved some remarkable results. It is composed of a low-level geometric analysis module for image processing tasks and a high-level geometric analysis module that performs logical labeling of layout objects. The implementation was done on a SUN 3/60 workstation in C and Common-Lisp and will be soon available in the MacIntosh environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. N. Srihari and G. W. Zack, “Document Image Analysis,” Proc. 8th ICPR, Paris, pp. 434–440, 1986.
Google Scholar
H. Eirund and K. Kreplin, “Knowledge Based Document Classification Supporting Integrated Document Handling,” Proc. COIS 1988, Palo Alto, CA, pp. 189–196, 1988.
Google Scholar
A. Dengel and G. Barth, “High Level Document Analysis Guided by Geometric Aspects,” Int’l J. on Pattern Recognition and AI, Vol. 2, No. 4, pp. 641–656, 1988.
Article Google Scholar
G. Nagy, “Document Analysis and Optical Character Recognition,” Proc. 5th Int’l Conf. on Image Analysis and Processing, Positano, Italy, 1989.
Google Scholar
F. Esposito et al, “Experimental Page Layout Recognition System for Office Document Automatic Classification: An Integrated Approach For Inductive Generalization,” Proc. 10th ICPR, Atlantic City, NJ, pp. 557–562, 1990.
Google Scholar
W. Horak, “Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization,” Computer, pp. 50–60, 1985.
Google Scholar
W. Luhn and A. Dengel, “Modellgestützte Segmentierung und Hypothesen- generierung für die Analyse von Papierdokumenten,” In H. Bunke, O. Kübler, and P. Stucki (eds.), Proc. 10th DAGM-Symposium Mustererkennung, Zürich, Informatik-Fachbericht 180, Springer-Verlag, Berlin, pp. 226–232, 1988.
Google Scholar
N. M. Mattos, B. Mitschang, A. Dengel, and R. Bleisinger, “An Approach to Integrated Document Processing & Management,” Proc. COIS-90, Boston, MA, pp. 118–123, 1990.
Google Scholar
ISO, International Standards Organization: Information Processing — Text and Office System Document Structures, Part 2, Office Document Architecture Draft Proposal.
Google Scholar
A. Dengel and E. Schweizer, “Rotationswinkelbestimmung in binären Dokument-bildern,” In H. Burkardt, H. Höhne, and B. Neumann (eds.), Proc. llth DAGM- Symposium Mustererkennung, Hamburg, Informatik-Fachberichte 219, Springer- Verlag, Berlin, pp. 274–278, 1989.
Google Scholar
J.-P. Trincklin, Conception d’un Système d’Analyse de Documents: Etude et Réalisation d’un Module d’Extraction de la Structure Physique de Documents à Support visuel, Ph.D. Dissertation, Université de France-comte, Besançon, 1984.
Google Scholar
W. Postl, “Detection of Linear Oblique Structures and Skew Scan in Digitized Documents,” Proc. 8th ICPR, Paris 1986, p. 240.
Google Scholar
K. Y. Wong, R. G. Casey, and F. M. Wahl, “Document Analysis System,” IBM J. Res. Dev., 26 (6) 1982.
Article Google Scholar
G. Nagy and S. Seth, “Hierarchical Representation of Optically Scanned Documents,” Proc. 7th ICPR, Montreal, p. 347, 1984.
Google Scholar
G. Nagy, S. Seth, and S. Stoddard, “Document Analysis with an Expert System,” Pattern Recognition in Practice II, Elsevier Science Publ. B. V, pp. 149–155, 1986.
Google Scholar
E. Schweizer, Erfassung, “Justierung und Segmentierung von Dokumentstrukturen,” Diploma Thesis, CS Dept, Univ. Stuttgart, 1989.
Google Scholar
N. M. Mattos, “Abstraction Concepts — The Basis for Data and Knowledge Modeling,” Proc. Ith Int’l Conf. on the Entity-Relationship Approach, Roma, Italy, pp. 331–350, 1988.
Google Scholar
A. Dengel, N. M. Mattos, and B. Mitschang, “An Integrated Document Management System,” Proc. SPIE / IEEE — Applications of Artificial Intelligence VIII, Orlando, FL, pp. 368–369, 1990.
Google Scholar
S. N. Srihari, C.-H. Wang, P. W. Palumbo, and J. J. Hull, “Recognizing Address Blocks on Mail Pieces: Specialized Tools and Problem-Solving Architecture,” AI Magazine, Vol. 8 No. 4, Winter 1987, p. 25.
Google Scholar
J. Butz, “Untersuchung und Definition von Struktureigenschaften in Bürodokumenten, sowie deren Bewertung mit Hilfe von kontinuierlichen Bewertungsmassen,” Bachelor Thesis, CS Dept, Univ. Stuttgart, 1988.
Google Scholar
A. Dengel and G. Barth, “ANASTASIL: A Hybrid Knowledge-based System for Document Layout Analysis,” Proc. llth IJCAI, Detroit, MI, pp. 1249–1254, 1989.
Google Scholar
G. Shafer, A Mathematical Theory of Evidence, Princeton Univ. Press, 1976.
MATH Google Scholar
A. Dengel, “Automatische Visuelle Klassifikation von Dokumenten” (in German), Ph.D. Thesis, CS Dept, Univ. Stuttgart, February 1989.
Google Scholar
A. Barr and E. A. Feigenbaum, The Handbook of Artificial Intelligence, Vol. 1, William Kaufmann Inc., Los Angeles, CA, 1981.
MATH Google Scholar
E. Melchinger, “Geometrische Wissenserwerbskomponente,” Bachelor Thesis, CS Dept, Univ. Stuttgart, 1988.
Google Scholar
F. Hönes, R. Bleisinger, and A. Dengel, “Intelligent Word-based Text Recognition,” Proc. OE-90, Symposium on Advances in Intelligent Systems ( Machine Vision and System Integration ), Boston, MA, November 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern Site, P.O. Box 20 80, D-6750, Kaiserslautern, Germany
Andreas Dengel

Authors

Andreas Dengel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computing Science Research Center, AT&T Bell Laboratories, 600 Mountain Avenue, Room 2C-322, P. O. Box 636, 07974-0636, Murray Hill, NJ, USA
Henry S. Baird
Institut für Informatik und angewandte Mathematik, Universität Bern, Länggass-Str. 51, CH-3012, Bern, Switzerland
Horst Bunke
Machine Understanding Division, Electrotechnical Laboratory, 1-1-4, Umezono, 305, Tsukuba Science City Ibaraki, Japan
Kazuhiko Yamamoto

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dengel, A. (1992). ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents. In: Baird, H.S., Bunke, H., Yamamoto, K. (eds) Structured Document Image Analysis. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-77281-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-77281-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-77283-2
Online ISBN: 978-3-642-77281-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics