Skip to main content

ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents

  • Chapter
Structured Document Image Analysis

Abstract

This paper focuses on the knowledge-based document analysis system ANASTASIL (Analysis System to Interpret Areas in Single-sided Letters). The system identifies important conceptual parts (logical objects) within business letters, like recipient, sender or company-specific printings. Thereby, the system works completely independent of text recognition. Instead, it only utilizes geometric knowledge sources. These are: global geometric knowledge about logical object arrangements, and local geometric knowledge about formal features of logical objects (e.g. extensions, typical font sizes, etc). As a result, a document image is classified by labeling area items by corresponding logical object designators after hypothesizing and testing geometric properties of the captured physical units (layout objects). Due to this strategy, ANASTASIL can be envisioned as a key for expectation-driven further analysis of logical objects by text or graphic recognition. The system has been completely implemented and has achieved some remarkable results. It is composed of a low-level geometric analysis module for image processing tasks and a high-level geometric analysis module that performs logical labeling of layout objects. The implementation was done on a SUN 3/60 workstation in C and Common-Lisp and will be soon available in the MacIntosh environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. N. Srihari and G. W. Zack, “Document Image Analysis,” Proc. 8th ICPR, Paris, pp. 434–440, 1986.

    Google Scholar 

  2. H. Eirund and K. Kreplin, “Knowledge Based Document Classification Supporting Integrated Document Handling,” Proc. COIS 1988, Palo Alto, CA, pp. 189–196, 1988.

    Google Scholar 

  3. A. Dengel and G. Barth, “High Level Document Analysis Guided by Geometric Aspects,” Int’l J. on Pattern Recognition and AI, Vol. 2, No. 4, pp. 641–656, 1988.

    Article  Google Scholar 

  4. G. Nagy, “Document Analysis and Optical Character Recognition,” Proc. 5th Int’l Conf. on Image Analysis and Processing, Positano, Italy, 1989.

    Google Scholar 

  5. F. Esposito et al, “Experimental Page Layout Recognition System for Office Document Automatic Classification: An Integrated Approach For Inductive Generalization,” Proc. 10th ICPR, Atlantic City, NJ, pp. 557–562, 1990.

    Google Scholar 

  6. W. Horak, “Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization,” Computer, pp. 50–60, 1985.

    Google Scholar 

  7. W. Luhn and A. Dengel, “Modellgestützte Segmentierung und Hypothesen- generierung für die Analyse von Papierdokumenten,” In H. Bunke, O. Kübler, and P. Stucki (eds.), Proc. 10th DAGM-Symposium Mustererkennung, Zürich, Informatik-Fachbericht 180, Springer-Verlag, Berlin, pp. 226–232, 1988.

    Google Scholar 

  8. N. M. Mattos, B. Mitschang, A. Dengel, and R. Bleisinger, “An Approach to Integrated Document Processing & Management,” Proc. COIS-90, Boston, MA, pp. 118–123, 1990.

    Google Scholar 

  9. ISO, International Standards Organization: Information Processing — Text and Office System Document Structures, Part 2, Office Document Architecture Draft Proposal.

    Google Scholar 

  10. A. Dengel and E. Schweizer, “Rotationswinkelbestimmung in binären Dokument-bildern,” In H. Burkardt, H. Höhne, and B. Neumann (eds.), Proc. llth DAGM- Symposium Mustererkennung, Hamburg, Informatik-Fachberichte 219, Springer- Verlag, Berlin, pp. 274–278, 1989.

    Google Scholar 

  11. J.-P. Trincklin, Conception d’un Système d’Analyse de Documents: Etude et Réalisation d’un Module d’Extraction de la Structure Physique de Documents à Support visuel, Ph.D. Dissertation, Université de France-comte, Besançon, 1984.

    Google Scholar 

  12. W. Postl, “Detection of Linear Oblique Structures and Skew Scan in Digitized Documents,” Proc. 8th ICPR, Paris 1986, p. 240.

    Google Scholar 

  13. K. Y. Wong, R. G. Casey, and F. M. Wahl, “Document Analysis System,” IBM J. Res. Dev., 26 (6) 1982.

    Article  Google Scholar 

  14. G. Nagy and S. Seth, “Hierarchical Representation of Optically Scanned Documents,” Proc. 7th ICPR, Montreal, p. 347, 1984.

    Google Scholar 

  15. G. Nagy, S. Seth, and S. Stoddard, “Document Analysis with an Expert System,” Pattern Recognition in Practice II, Elsevier Science Publ. B. V, pp. 149–155, 1986.

    Google Scholar 

  16. E. Schweizer, Erfassung, “Justierung und Segmentierung von Dokumentstrukturen,” Diploma Thesis, CS Dept, Univ. Stuttgart, 1989.

    Google Scholar 

  17. N. M. Mattos, “Abstraction Concepts — The Basis for Data and Knowledge Modeling,” Proc. Ith Int’l Conf. on the Entity-Relationship Approach, Roma, Italy, pp. 331–350, 1988.

    Google Scholar 

  18. A. Dengel, N. M. Mattos, and B. Mitschang, “An Integrated Document Management System,” Proc. SPIE / IEEE — Applications of Artificial Intelligence VIII, Orlando, FL, pp. 368–369, 1990.

    Google Scholar 

  19. S. N. Srihari, C.-H. Wang, P. W. Palumbo, and J. J. Hull, “Recognizing Address Blocks on Mail Pieces: Specialized Tools and Problem-Solving Architecture,” AI Magazine, Vol. 8 No. 4, Winter 1987, p. 25.

    Google Scholar 

  20. J. Butz, “Untersuchung und Definition von Struktureigenschaften in Bürodokumenten, sowie deren Bewertung mit Hilfe von kontinuierlichen Bewertungsmassen,” Bachelor Thesis, CS Dept, Univ. Stuttgart, 1988.

    Google Scholar 

  21. A. Dengel and G. Barth, “ANASTASIL: A Hybrid Knowledge-based System for Document Layout Analysis,” Proc. llth IJCAI, Detroit, MI, pp. 1249–1254, 1989.

    Google Scholar 

  22. G. Shafer, A Mathematical Theory of Evidence, Princeton Univ. Press, 1976.

    MATH  Google Scholar 

  23. A. Dengel, “Automatische Visuelle Klassifikation von Dokumenten” (in German), Ph.D. Thesis, CS Dept, Univ. Stuttgart, February 1989.

    Google Scholar 

  24. A. Barr and E. A. Feigenbaum, The Handbook of Artificial Intelligence, Vol. 1, William Kaufmann Inc., Los Angeles, CA, 1981.

    MATH  Google Scholar 

  25. E. Melchinger, “Geometrische Wissenserwerbskomponente,” Bachelor Thesis, CS Dept, Univ. Stuttgart, 1988.

    Google Scholar 

  26. F. Hönes, R. Bleisinger, and A. Dengel, “Intelligent Word-based Text Recognition,” Proc. OE-90, Symposium on Advances in Intelligent Systems ( Machine Vision and System Integration ), Boston, MA, November 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Dengel, A. (1992). ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents. In: Baird, H.S., Bunke, H., Yamamoto, K. (eds) Structured Document Image Analysis. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-77281-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-77281-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-77283-2

  • Online ISBN: 978-3-642-77281-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics