Word Spotting in the Wild

  • Kai Wang
  • Serge Belongie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6311)


We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of difficulty. At one end of the spectrum, Optical Character Recognition (OCR) applied to scanned pages of well formatted printed text is one of the most successful applications of computer vision to date. At the other extreme lie visual CAPTCHAs – text that is constructed explicitly to fool computer vision algorithms. Both tasks involve recognizing text, yet one is nearly solved while the other remains extremely challenging. In this work, we argue that the appearance of words in the wild spans this range of difficulties and propose a new word recognition approach based on state-of-the-art methods from generic object recognition, in which we consider object categories to be the words themselves. We compare performance of leading OCR engines – one open source and one proprietary – with our new approach on the ICDAR Robust Reading data set and a new word spotting data set we introduce in this paper: the Street View Text data set. We show improvements of up to 16% on the data sets, demonstrating the feasibility of a new approach to a seemingly old problem.


Word Recognition Image Text Optical Character Recognition Multiple Kernel Learn Street View 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: Captcha: Using hard AI problems for security. In: Eurocrypt (2003)Google Scholar
  2. 2.
    Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst (2007)Google Scholar
  3. 3.
    Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR (2003)Google Scholar
  4. 4.
    Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. PAMI 22, 1349–1380 (2000)Google Scholar
  5. 5.
    Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: CVPR (2003)Google Scholar
  6. 6.
    Nagy, G.: At the frontiers of OCR. Proceedings of IEEE 80, 1093–1100 (1992)CrossRefGoogle Scholar
  7. 7.
    Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR research and development. Document Image Analysis, 244–273 (1995)Google Scholar
  8. 8.
    Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. PAMI 18, 690–706 (1996)Google Scholar
  9. 9.
    Chellapilla, K., Larson, K., Simard, P.Y., Czerwinski, M.: Designing human friendly human interaction proofs (HIPs). In: CHI (2005)Google Scholar
  10. 10.
    Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: An automatic system to detect and recognize text in images. IEEE Trans. PAMI 21, 1224–1229 (1999)Google Scholar
  11. 11.
    Sato, T., Kanade, T., Hughes, E.K., Smith, M.A., Satoh, S.: Video OCR: indexing digital new libraries by recognition of superimposed captions. Multimedia Systems 7, 385–395 (1999)CrossRefGoogle Scholar
  12. 12.
    Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. PAMI 31, 1733–1746 (2009)Google Scholar
  13. 13.
    Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR (2004)Google Scholar
  14. 14.
    Vanhoucke, V., Gokturk, S.B.: Reading text in consumer digital photographs. In: SPIE (2007)Google Scholar
  15. 15.
    Mori, G., Malik, J.: Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In: CVPR (2003)Google Scholar
  16. 16.
    Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Trans. on Computers 22, 67–92 (1973)CrossRefGoogle Scholar
  17. 17.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61, 55–79 (2005)CrossRefGoogle Scholar
  18. 18.
    de Campos, T., Babu, B., Varma, M.: Character recognition in natural images. In: VISAPP (2009)Google Scholar
  19. 19.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  20. 20.
    Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: CVPR (2005)Google Scholar
  21. 21.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24, 509–522 (2002)Google Scholar
  22. 22.
    Canny, J.: A computational approach to edge detection. IEEE Trans. PAMI 8, 679–698 (1986)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Kai Wang
    • 1
  • Serge Belongie
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of CaliforniaSan Diego

Personalised recommendations