Abstract
In the field of document image processing, the text/graphic separation is a major step that conditions the performance of the recognition and indexing systems. That involves identifying and separating the graphical and textual components of a document image. In this context, it is important to implement approaches that effectively address these problems. This paper presents a method for separating textual and non textual components in document images using a graph-based modeling and structural analysis. This is a fast and efficient method to separate adequately the graphical and the textual areas of a document. Some examples obtained on technical documents and magazines issued from the databases approved by the community make it possible to validate the approach.
Keywords
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Antonacopoulos, A., Karatzas, D.: Semantics based content extraction in typewritten historical documents. In: 8th International Conference on Document Analysis and Recognition, pp. 48–53 (2005)
Jain, A.K.: Fundamentals of digital image processing. Prentice Hall (1989)
Mitchell, P.E., Yan, H.: Newspaper document analysis featuring connected line segmentation. In: Sixth International Conference on Document Analysis and Recognition, pp. 1181–1185 (2001)
Faure, C., Vincent, N.: Simultaneous detection of vertical and horizontal text lines based on perceptual organization. In: 16th Document Recognition and Retrieval Conference, DRR 2009, USA (2009)
Wong, K.Y., Casey, R.G., Wahi, F.M.: Document analysis system. IBM Journal of Research Development 26, 647–656 (1982)
Caponetti, L., Castiello, C., Gorecki, P.: Document page segmentation using neurofuzzy approach. Applied Soft Computing (2007) (in press, corrected proof)
Bukhari, S.S., Shafait, F., Breuel, T.M.: Segmentation of curled textlines using active contours. In: The Eighth IAPR Workshop on Document Analysis Systems (2008)
Ramel, J., Leriche, S.: Segmentation et analyse interactive de documents anciens imprimes. In: Traitement du Signal (TS), pp. 209–222 (2005)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient Graph-Based Image Segmentation. International Journal of Computer Vision 59(2), 167–181 (2004)
Antonacopoulos, A., Bridson, D., Papadopoulos, C., Pletschacher, S.: Performance Analysis Framework for Layout Analysis Methods. In: Proceedings of The 10th International Conference on Document Analysis and Recognition (ICDAR 2009), Catalonia, Spain, pp. 296–300 (September 2009)
Guyon, I., Haralick, R.M., Hull, J.J., Phillips, I.T.: Data sets for OCR and document image understanding research. In: Bunke, H., Wang, P. (eds.) Handbook of Character Recognition and Document Image Analysis, pp. 779–799. World Scientific, Singapore (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zirari, F., Mammass, D., Ennaji, A., Nicolas, S. (2012). A Graph Based Approach for Heterogeneous Document Segmentation. In: Elmoataz, A., Mammass, D., Lezoray, O., Nouboud, F., Aboutajdine, D. (eds) Image and Signal Processing. ICISP 2012. Lecture Notes in Computer Science, vol 7340. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31254-0_48
Download citation
DOI: https://doi.org/10.1007/978-3-642-31254-0_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31253-3
Online ISBN: 978-3-642-31254-0
eBook Packages: Computer ScienceComputer Science (R0)