Advertisement

ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents

  • Andreas Dengel

Abstract

This paper focuses on the knowledge-based document analysis system ANASTASIL (Analysis System to Interpret Areas in Single-sided Letters). The system identifies important conceptual parts (logical objects) within business letters, like recipient, sender or company-specific printings. Thereby, the system works completely independent of text recognition. Instead, it only utilizes geometric knowledge sources. These are: global geometric knowledge about logical object arrangements, and local geometric knowledge about formal features of logical objects (e.g. extensions, typical font sizes, etc). As a result, a document image is classified by labeling area items by corresponding logical object designators after hypothesizing and testing geometric properties of the captured physical units (layout objects). Due to this strategy, ANASTASIL can be envisioned as a key for expectation-driven further analysis of logical objects by text or graphic recognition. The system has been completely implemented and has achieved some remarkable results. It is composed of a low-level geometric analysis module for image processing tasks and a high-level geometric analysis module that performs logical labeling of layout objects. The implementation was done on a SUN 3/60 workstation in C and Common-Lisp and will be soon available in the MacIntosh environment.

Keywords

Geometric Analysis Document Image Text Line Logical Object Print Document 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    S. N. Srihari and G. W. Zack, “Document Image Analysis,” Proc. 8th ICPR, Paris, pp. 434–440, 1986.Google Scholar
  2. [2]
    H. Eirund and K. Kreplin, “Knowledge Based Document Classification Supporting Integrated Document Handling,” Proc. COIS 1988, Palo Alto, CA, pp. 189–196, 1988.Google Scholar
  3. [3]
    A. Dengel and G. Barth, “High Level Document Analysis Guided by Geometric Aspects,” Int’l J. on Pattern Recognition and AI, Vol. 2, No. 4, pp. 641–656, 1988.CrossRefGoogle Scholar
  4. [4]
    G. Nagy, “Document Analysis and Optical Character Recognition,” Proc. 5th Int’l Conf. on Image Analysis and Processing, Positano, Italy, 1989.Google Scholar
  5. [5]
    F. Esposito et al, “Experimental Page Layout Recognition System for Office Document Automatic Classification: An Integrated Approach For Inductive Generalization,” Proc. 10th ICPR, Atlantic City, NJ, pp. 557–562, 1990.Google Scholar
  6. [6]
    W. Horak, “Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization,” Computer, pp. 50–60, 1985.Google Scholar
  7. [7]
    W. Luhn and A. Dengel, “Modellgestützte Segmentierung und Hypothesen- generierung für die Analyse von Papierdokumenten,” In H. Bunke, O. Kübler, and P. Stucki (eds.), Proc. 10th DAGM-Symposium Mustererkennung, Zürich, Informatik-Fachbericht 180, Springer-Verlag, Berlin, pp. 226–232, 1988.Google Scholar
  8. [8]
    N. M. Mattos, B. Mitschang, A. Dengel, and R. Bleisinger, “An Approach to Integrated Document Processing & Management,” Proc. COIS-90, Boston, MA, pp. 118–123, 1990.Google Scholar
  9. [9]
    ISO, International Standards Organization: Information Processing — Text and Office System Document Structures, Part 2, Office Document Architecture Draft Proposal.Google Scholar
  10. [10]
    A. Dengel and E. Schweizer, “Rotationswinkelbestimmung in binären Dokument-bildern,” In H. Burkardt, H. Höhne, and B. Neumann (eds.), Proc. llth DAGM- Symposium Mustererkennung, Hamburg, Informatik-Fachberichte 219, Springer- Verlag, Berlin, pp. 274–278, 1989.Google Scholar
  11. [11]
    J.-P. Trincklin, Conception d’un Système d’Analyse de Documents: Etude et Réalisation d’un Module d’Extraction de la Structure Physique de Documents à Support visuel, Ph.D. Dissertation, Université de France-comte, Besançon, 1984.Google Scholar
  12. [12]
    W. Postl, “Detection of Linear Oblique Structures and Skew Scan in Digitized Documents,” Proc. 8th ICPR, Paris 1986, p. 240.Google Scholar
  13. [13]
    K. Y. Wong, R. G. Casey, and F. M. Wahl, “Document Analysis System,” IBM J. Res. Dev., 26 (6) 1982.CrossRefGoogle Scholar
  14. [14]
    G. Nagy and S. Seth, “Hierarchical Representation of Optically Scanned Documents,” Proc. 7th ICPR, Montreal, p. 347, 1984.Google Scholar
  15. [15]
    G. Nagy, S. Seth, and S. Stoddard, “Document Analysis with an Expert System,” Pattern Recognition in Practice II, Elsevier Science Publ. B. V, pp. 149–155, 1986.Google Scholar
  16. [16]
    E. Schweizer, Erfassung, “Justierung und Segmentierung von Dokumentstrukturen,” Diploma Thesis, CS Dept, Univ. Stuttgart, 1989.Google Scholar
  17. [17]
    N. M. Mattos, “Abstraction Concepts — The Basis for Data and Knowledge Modeling,” Proc. Ith Int’l Conf. on the Entity-Relationship Approach, Roma, Italy, pp. 331–350, 1988.Google Scholar
  18. [18]
    A. Dengel, N. M. Mattos, and B. Mitschang, “An Integrated Document Management System,” Proc. SPIE / IEEE — Applications of Artificial Intelligence VIII, Orlando, FL, pp. 368–369, 1990.Google Scholar
  19. [19]
    S. N. Srihari, C.-H. Wang, P. W. Palumbo, and J. J. Hull, “Recognizing Address Blocks on Mail Pieces: Specialized Tools and Problem-Solving Architecture,” AI Magazine, Vol. 8 No. 4, Winter 1987, p. 25.Google Scholar
  20. [20]
    J. Butz, “Untersuchung und Definition von Struktureigenschaften in Bürodokumenten, sowie deren Bewertung mit Hilfe von kontinuierlichen Bewertungsmassen,” Bachelor Thesis, CS Dept, Univ. Stuttgart, 1988.Google Scholar
  21. [21]
    A. Dengel and G. Barth, “ANASTASIL: A Hybrid Knowledge-based System for Document Layout Analysis,” Proc. llth IJCAI, Detroit, MI, pp. 1249–1254, 1989.Google Scholar
  22. [22]
    G. Shafer, A Mathematical Theory of Evidence, Princeton Univ. Press, 1976.MATHGoogle Scholar
  23. [23]
    A. Dengel, “Automatische Visuelle Klassifikation von Dokumenten” (in German), Ph.D. Thesis, CS Dept, Univ. Stuttgart, February 1989.Google Scholar
  24. [24]
    A. Barr and E. A. Feigenbaum, The Handbook of Artificial Intelligence, Vol. 1, William Kaufmann Inc., Los Angeles, CA, 1981.MATHGoogle Scholar
  25. [25]
    E. Melchinger, “Geometrische Wissenserwerbskomponente,” Bachelor Thesis, CS Dept, Univ. Stuttgart, 1988.Google Scholar
  26. [26]
    F. Hönes, R. Bleisinger, and A. Dengel, “Intelligent Word-based Text Recognition,” Proc. OE-90, Symposium on Advances in Intelligent Systems ( Machine Vision and System Integration ), Boston, MA, November 1990.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1992

Authors and Affiliations

  • Andreas Dengel
    • 1
  1. 1.German Research Center for Artificial Intelligence (DFKI)KaiserslauternGermany

Personalised recommendations