Advertisement

Text Localization in Historical Document Images with Local Binary Patterns and Variance Models

  • Tapan Kumar Bhowmik
  • Manika Kar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8251)

Abstract

In this paper, we explore the utility of Local Binary Pattern (LBP) descriptors and variance measure towards the development of efficient techniques in order to segment a large collection of historical machine printed document pages. The result of segmentation will help us to organize the document pages in a structural format, which is useful in many applications like historical document access. In our experiments, three basic reference models namely background, text and image models are used to segment various non-text information together with the text. The method is tested on an archive of Portuguese historical documents and shows promising results.

References

  1. 1.
    Biblioteca Nacional De Portugal, http://purl.pt/index/geral/PT/index.html
  2. 2.
    Baird, H.: Digital libraries and document image analysis. In: Proc. of the 7th ICDAR, pp. 2–14 (2003)Google Scholar
  3. 3.
    Etemad, K., Doermann, D., Chellappa, R.: Multiscale segmentation of unstructured document pages using soft decision integration. Trans. on. IEEE 19(1), 92–96 (1997)Google Scholar
  4. 4.
    Fletcher, L.A., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. Trans. on. IEEE 10(6), 910–918 (1988)Google Scholar
  5. 5.
    Gorman, L.O.: The document spectrum for page layout analysis. Trans. on. IEEE 15(11), 1162–1173 (1993)Google Scholar
  6. 6.
    Jain, A., Bhattacharjee, S.: Text segmentation using gabor filters for automatic document processing. Machine Vision Appl. 5, 169–184 (1992)CrossRefGoogle Scholar
  7. 7.
    Kim, K.I., Jung, K., Kim, J.H.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. Trans. on. IEEE 25(12), 1631–1639 (2003)MathSciNetGoogle Scholar
  8. 8.
    Nagy, G., Seth, S.C., Stoddard, S.D.: Document analysis with an expert system. In: Pattern Recognition in Practice II, pp. 149–155. Elsevier Science, New York (1986)CrossRefGoogle Scholar
  9. 9.
    Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classi cation with local binary patterns. Trans. on. IEEE 24(7), 971–987 (2002)Google Scholar
  10. 10.
    Wong, K.Y., Casey, R.G., Wahl, F.M.: Document analysis system. IBM J. Res. Development 6, 456–642 (1982)Google Scholar
  11. 11.
    Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: An automatic system to detect and recognize text in images. Trans. on. IEEE 21(11), 1224–1228 (1999)Google Scholar
  12. 12.
    Zheng, Y., Li, H., Doermann, D.: Machine printed text and identification in noisy document images. Trans. on. IEEE 26(3), 337–353 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Tapan Kumar Bhowmik
    • 1
  • Manika Kar
    • 2
  1. 1.LITIS EA-4108Université de RouenFrance
  2. 2.Departamento de Engenharia InformáticaUniversidade do PortoPortugal

Personalised recommendations