Skip to main content
Log in

Document analysis by crosscount approach

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

In this paper a new feature called crosscount for document analysis is introduced. The feature crosscount is a function of white line segment with its start on the edge of document images. It reflects not only the contour of image, but also the periodicity of white lines (background) and text lines in the document images. In complex printed-page layouts, there are different blocks such as textual, graphical, tabular, and so on. Of these blocks, textual ones have the most obvious periodicity with their homogenous white lines arranged regularly. The important property of textual blocks can be extracted by crosscount functions. Here the document layouts are classified into three classes on the basis of their physical structures. Then the definition and properties of the crosscount function are described. According to the classification of document layouts, the application of this new feature to different types of document images’ analysis and understanding is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Fu K S. Syntactic Pattern Recognition and Application. Prentice-Hall Inc., 1982.

  2. Wong K Y, Casey R G, Wahl F M. Document analysis system.IBM Journal of Research and Development, 1982, 26(2): 647–656.

    Article  Google Scholar 

  3. Rosenfeld A, Kak A C. Digital Picture Processing, Vols 1, 2, Second Edition. Academic Press, 1982.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Wang Haiqin recieved her B.S. degree at the University of Science and Technology of China and her M.S. degree at the Institute of Automation, Chinese Academy of Sciences. She is now a Ph.D candidate of University of Pittsburgh, USA.

Dai Ruwei graduated from the Department of Mathematics and Mechanics, Beijing University in 1955. He has been working in the Chinese Academy of Sciences since 1956. From 1980 to 1982, he was a visiting scholar at the School of Electrical Engineering, Purdue University. He has published more than 150 articles on pattern recognition, artificial neural network as well as Chinese character recognition in China and abroad. He was elected the member of Chinese Academy of Sciences in 1991. Now he is the Chairman of Academic Committee, Institute of Automation, Chinese Academy of Sciences, and is the chief editor of Chinese journal «Pattern Recognition and Artificial Intelligence». His research interests are Chinese character recognition, artificial intelligence and open giant complex systems.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Dai, R. Document analysis by crosscount approach. J. of Comput. Sci. & Technol. 13, 32–40 (1998). https://doi.org/10.1007/BF02946612

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02946612

Keywords

Navigation