Advertisement

Retrieving information from document images: problems and solutions

  • Fu Chang
SI: Document Analysis for Office Systems

Abstract.

An information retrieval system that captures both visual and textual contents from paper documents can derive maximal benefits from DAR techniques while demanding little human assistance to achieve its goals. This article discusses technical problems, along with solution methods, and their integration into a well-performing system. The focus of the discussion is very difficult applications, for example, Chinese and Japanese documents. Solution methods are also highlighted, with the emphasis placed upon some new ideas, including window-based binarization using scale measures, document layout analysis for solving the multiple constraint problem, and full-text searching techniques capable of evading machine recognition errors.

Key words: Binarization – Document image analysis – Error-tolerant full text search – Information retrieval – Multiple constraint problem 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Fu Chang
    • 1
  1. 1.Document Analysis and Recognition Laboratory, Institute of Information Science 20, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan R.O.C.; e-mail: fchang@iis.sinica.edu.tw TW

Personalised recommendations