Learning Visual Shape Lexicon for Document Image Content Recognition

  • Guangyu Zhu
  • Xiaodong Yu
  • Yi Li
  • David Doermann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5303)


Developing effective content recognition methods for diverse imagery continues to challenge computer vision researchers. We present a new approach for document image content categorization using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant shape feature that is generic enough to be detected repeatably and segmentation free. We learn a concise, structurally indexed shape lexicon from training by clustering and partitioning feature types through graph cuts. We demonstrate our approach on two challenging document image content recognition problems: 1) The classification of 4,500 Web images crawled from Google Image Search into three content categories — pure image, image with text, and document image, and 2) Language identification of 8 languages (Arabic, Chinese, English, Hindi, Japanese, Korean, Russian, and Thai) on a 1,512 complex document image database composed of mixed machine printed text and handwriting. Our approach is capable to handle high intra-class variability and shows results that exceed other state-of-the-art approaches, allowing it to be used as a content recognizer in image indexing and retrieval systems.


Local Binary Pattern Pattern Anal Template Match Document Image Text Line 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Amit, Y., Geman, D.: A computational model for visual selection. Neural Computation 11, 1691–1715 (1999)CrossRefGoogle Scholar
  2. 2.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)CrossRefGoogle Scholar
  3. 3.
    Biswas, S., Aggarwal, G., Chellappa, R.: Efficient indexing for articulation invariant shape matching and retrieval. In: Proc. CVPR, pp. 1–8 (2007)Google Scholar
  4. 4.
    Busch, A., Boles, W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1720–1732 (2005)CrossRefGoogle Scholar
  5. 5.
    Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 679–697 (1986)CrossRefGoogle Scholar
  6. 6.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR, pp. 886–893 (2005)Google Scholar
  7. 7.
    Ding, J., Lam, L., Suen, C.: Classification of oriental and European scripts by using characteristic features. In: Proc. ICDAR, pp. 1023–1027 (1997)Google Scholar
  8. 8.
    Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 1–16 (2008)CrossRefGoogle Scholar
  9. 9.
    Fidler, S., Leonardis, A.: Towards scalable representations of object categories: Learning a hierarchy of parts. In: Proc. CVPR, pp. 1–8 (2007)Google Scholar
  10. 10.
    Gdalyahu, Y., Weinshall, D.: Flexible syntactic matching of curves and its application to automatic hierarchical classification of silhouettes. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1312–1328 (1999)CrossRefGoogle Scholar
  11. 11.
    Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 176–181 (1997)CrossRefGoogle Scholar
  12. 12.
    Jacobs, D.: Robust and efficient detection of salient convex groups. IEEE Trans. Pattern Anal. Mach. Intell. 18(1), 23–37 (1996)CrossRefGoogle Scholar
  13. 13.
    Latecki, L., Lakamper, R., Eckhardt, U.: Shape descriptors for non-rigid shapes with a single closed contour. In: Proc. CVPR, pp. 424–429 (2000)Google Scholar
  14. 14.
    Lee, D., Nohl, C., Baird, H.: Language Identification in Complex, Unoriented, and Degraded Document Images. Document Analysis Systems II (1998)Google Scholar
  15. 15.
    Li, Y., Zheng, Y., Doermann, D., Jaeger, S.: Script-independent text line segmentation in freestyle handwritten documents. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1313–1329 (2008)CrossRefGoogle Scholar
  16. 16.
    Ling, H., Jacobs, D.: Shape classification using the inner-distance. IEEE Trans. Pattern Anal. Mach. Intell. 29(2), 286–299 (2007)CrossRefGoogle Scholar
  17. 17.
    Lowe, D.: Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence 31(3), 355–395 (1987)CrossRefGoogle Scholar
  18. 18.
    Lu, S., Tan, C.: Script and language identification in noisy and degraded document images. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 14–24 (2008)Google Scholar
  19. 19.
    Marti, U., Bunke, H.: The IAM-database: An English sentence database for off-line handwriting recognition. Int. J. Document Analysis and Recognition 5, 39–46 (2006), CrossRefzbMATHGoogle Scholar
  20. 20.
    Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)CrossRefzbMATHGoogle Scholar
  21. 21.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Computer Vision 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  22. 22.
    Plamondon, R., Srihari, S.: On-line and off-line handwriting recognition: A comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)CrossRefGoogle Scholar
  23. 23.
    Rice, S., Nagy, G., Nartker, T.: Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer Academic Publishers, Dordrecht (1999)CrossRefGoogle Scholar
  24. 24.
    Rothwell, C., Zisserman, A., Forsyth, D., Mundy, J.: Planar object recognition using projective shape representation. Int. J. Computer Vision 16(5), 57–99 (1995)CrossRefGoogle Scholar
  25. 25.
    Sharvit, D., Chan, J., Tek, H., Kimia, B.: Symmetry-based indexing of image database. J. Visual Commun. and Image Representation 9(4), 366–380 (1998)CrossRefGoogle Scholar
  26. 26.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  27. 27.
    Spitz, A.: Determination of script and language content of document images. IEEE Trans. Pattern Anal. Mach. Intell. 19(3), 235–245 (1997)CrossRefGoogle Scholar
  28. 28.
    Suen, C., Bergler, S., Nobile, N., Waked, B., Nadal, C., Bloch, A.: Categorizing document images into script and language classes. In: Proc. ICDAR, pp. 297–306 (1998)Google Scholar
  29. 29.
    Tan, T.: Rotation invariant texture features and their use in automatic script identification. IEEE Trans. Pattern Anal. Mach. Intell. 20(7), 751–756 (1998)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Vincent, L.: Google Book Search: Document understanding on a massive scale. In: Proc. ICDAR, pp. 819–823 (2007)Google Scholar
  31. 31.
    Yu, S., Shi, J.: Multiclass spectral clustering. In: Proc. ICCV, pp. 11–17 (2003)Google Scholar
  32. 32.
    Zhu, G., Bethea, T.J., Krishna, V.: Extracting relevant named entities for automated expense reimbursement. In: Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 1004–1012 (2007)Google Scholar
  33. 33.
    Zhu, G., Yu, X., Li, Y., Doermann, D.: Unconstrained language identification using a shape codebook. In: Proc. ICFHR, pp. 13–18 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Guangyu Zhu
    • 1
  • Xiaodong Yu
    • 1
  • Yi Li
    • 1
  • David Doermann
    • 1
  1. 1.University of MarylandCollege ParkUSA

Personalised recommendations