A Graph Based Approach for Heterogeneous Document Segmentation

  • Fattah Zirari
  • Driss Mammass
  • Abdellatif Ennaji
  • Stephane Nicolas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7340)

Abstract

In the field of document image processing, the text/graphic separation is a major step that conditions the performance of the recognition and indexing systems. That involves identifying and separating the graphical and textual components of a document image. In this context, it is important to implement approaches that effectively address these problems. This paper presents a method for separating textual and non textual components in document images using a graph-based modeling and structural analysis. This is a fast and efficient method to separate adequately the graphical and the textual areas of a document. Some examples obtained on technical documents and magazines issued from the databases approved by the community make it possible to validate the approach.

Keywords

Segmentation text/no text Separation Document Image Graph modelization structural analysis 

References

  1. 1.
    Antonacopoulos, A., Karatzas, D.: Semantics based content extraction in typewritten historical documents. In: 8th International Conference on Document Analysis and Recognition, pp. 48–53 (2005)Google Scholar
  2. 2.
    Jain, A.K.: Fundamentals of digital image processing. Prentice Hall (1989)Google Scholar
  3. 3.
    Mitchell, P.E., Yan, H.: Newspaper document analysis featuring connected line segmentation. In: Sixth International Conference on Document Analysis and Recognition, pp. 1181–1185 (2001)Google Scholar
  4. 4.
    Faure, C., Vincent, N.: Simultaneous detection of vertical and horizontal text lines based on perceptual organization. In: 16th Document Recognition and Retrieval Conference, DRR 2009, USA (2009)Google Scholar
  5. 5.
    Wong, K.Y., Casey, R.G., Wahi, F.M.: Document analysis system. IBM Journal of Research Development 26, 647–656 (1982)CrossRefGoogle Scholar
  6. 6.
    Caponetti, L., Castiello, C., Gorecki, P.: Document page segmentation using neurofuzzy approach. Applied Soft Computing (2007) (in press, corrected proof)Google Scholar
  7. 7.
    Bukhari, S.S., Shafait, F., Breuel, T.M.: Segmentation of curled textlines using active contours. In: The Eighth IAPR Workshop on Document Analysis Systems (2008)Google Scholar
  8. 8.
    Ramel, J., Leriche, S.: Segmentation et analyse interactive de documents anciens imprimes. In: Traitement du Signal (TS), pp. 209–222 (2005)Google Scholar
  9. 9.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient Graph-Based Image Segmentation. International Journal of Computer Vision 59(2), 167–181 (2004)CrossRefGoogle Scholar
  10. 10.
    Antonacopoulos, A., Bridson, D., Papadopoulos, C., Pletschacher, S.: Performance Analysis Framework for Layout Analysis Methods. In: Proceedings of The 10th International Conference on Document Analysis and Recognition (ICDAR 2009), Catalonia, Spain, pp. 296–300 (September 2009)Google Scholar
  11. 11.
    Guyon, I., Haralick, R.M., Hull, J.J., Phillips, I.T.: Data sets for OCR and document image understanding research. In: Bunke, H., Wang, P. (eds.) Handbook of Character Recognition and Document Image Analysis, pp. 779–799. World Scientific, Singapore (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Fattah Zirari
    • 1
    • 2
  • Driss Mammass
    • 1
  • Abdellatif Ennaji
    • 2
  • Stephane Nicolas
    • 2
  1. 1.Laboratory IRF-SICIbn Zohr UniversityAgadirMorocco
  2. 2.Laboratory LITISUniversity of RouenRouenFrance

Personalised recommendations