Abstract
In this paper a new feature called crosscount for document analysis is introduced. The feature crosscount is a function of white line segment with its start on the edge of document images. It reflects not only the contour of image, but also the periodicity of white lines (background) and text lines in the document images. In complex printed-page layouts, there are different blocks such as textual, graphical, tabular, and so on. Of these blocks, textual ones have the most obvious periodicity with their homogenous white lines arranged regularly. The important property of textual blocks can be extracted by crosscount functions. Here the document layouts are classified into three classes on the basis of their physical structures. Then the definition and properties of the crosscount function are described. According to the classification of document layouts, the application of this new feature to different types of document images’ analysis and understanding is discussed.
Similar content being viewed by others
References
Fu K S. Syntactic Pattern Recognition and Application. Prentice-Hall Inc., 1982.
Wong K Y, Casey R G, Wahl F M. Document analysis system.IBM Journal of Research and Development, 1982, 26(2): 647–656.
Rosenfeld A, Kak A C. Digital Picture Processing, Vols 1, 2, Second Edition. Academic Press, 1982.
Author information
Authors and Affiliations
Additional information
Wang Haiqin recieved her B.S. degree at the University of Science and Technology of China and her M.S. degree at the Institute of Automation, Chinese Academy of Sciences. She is now a Ph.D candidate of University of Pittsburgh, USA.
Dai Ruwei graduated from the Department of Mathematics and Mechanics, Beijing University in 1955. He has been working in the Chinese Academy of Sciences since 1956. From 1980 to 1982, he was a visiting scholar at the School of Electrical Engineering, Purdue University. He has published more than 150 articles on pattern recognition, artificial neural network as well as Chinese character recognition in China and abroad. He was elected the member of Chinese Academy of Sciences in 1991. Now he is the Chairman of Academic Committee, Institute of Automation, Chinese Academy of Sciences, and is the chief editor of Chinese journal «Pattern Recognition and Artificial Intelligence». His research interests are Chinese character recognition, artificial intelligence and open giant complex systems.
Rights and permissions
About this article
Cite this article
Wang, H., Dai, R. Document analysis by crosscount approach. J. of Comput. Sci. & Technol. 13, 32–40 (1998). https://doi.org/10.1007/BF02946612
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02946612