Word Spotting in the Wild

Wang, Kai; Belongie, Serge

doi:10.1007/978-3-642-15549-9_43

Kai Wang¹⁹ &
Serge Belongie¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6311))

Included in the following conference series:

European Conference on Computer Vision

9793 Accesses
164 Citations

Abstract

We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of difficulty. At one end of the spectrum, Optical Character Recognition (OCR) applied to scanned pages of well formatted printed text is one of the most successful applications of computer vision to date. At the other extreme lie visual CAPTCHAs – text that is constructed explicitly to fool computer vision algorithms. Both tasks involve recognizing text, yet one is nearly solved while the other remains extremely challenging. In this work, we argue that the appearance of words in the wild spans this range of difficulties and propose a new word recognition approach based on state-of-the-art methods from generic object recognition, in which we consider object categories to be the words themselves. We compare performance of leading OCR engines – one open source and one proprietary – with our new approach on the ICDAR Robust Reading data set and a new word spotting data set we introduce in this paper: the Street View Text data set. We show improvements of up to 16% on the data sets, demonstrating the feasibility of a new approach to a seemingly old problem.

Download to read the full chapter text

Chapter PDF

An Automated Pipeline for Robust Image Processing and Optical Character Recognition of Historical Documents

Out-of-Vocabulary Challenge Report

GNHK: A Dataset for English Handwriting in the Wild

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: Captcha: Using hard AI problems for security. In: Eurocrypt (2003)
Google Scholar
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst (2007)
Google Scholar
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR (2003)
Google Scholar
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. PAMI 22, 1349–1380 (2000)
Google Scholar
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: CVPR (2003)
Google Scholar
Nagy, G.: At the frontiers of OCR. Proceedings of IEEE 80, 1093–1100 (1992)
Article Google Scholar
Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR research and development. Document Image Analysis, 244–273 (1995)
Google Scholar
Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. PAMI 18, 690–706 (1996)
Google Scholar
Chellapilla, K., Larson, K., Simard, P.Y., Czerwinski, M.: Designing human friendly human interaction proofs (HIPs). In: CHI (2005)
Google Scholar
Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: An automatic system to detect and recognize text in images. IEEE Trans. PAMI 21, 1224–1229 (1999)
Google Scholar
Sato, T., Kanade, T., Hughes, E.K., Smith, M.A., Satoh, S.: Video OCR: indexing digital new libraries by recognition of superimposed captions. Multimedia Systems 7, 385–395 (1999)
Article Google Scholar
Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. PAMI 31, 1733–1746 (2009)
Google Scholar
Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR (2004)
Google Scholar
Vanhoucke, V., Gokturk, S.B.: Reading text in consumer digital photographs. In: SPIE (2007)
Google Scholar
Mori, G., Malik, J.: Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In: CVPR (2003)
Google Scholar
Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Trans. on Computers 22, 67–92 (1973)
Article Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61, 55–79 (2005)
Article Google Scholar
de Campos, T., Babu, B., Varma, M.: Character recognition in natural images. In: VISAPP (2009)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: CVPR (2005)
Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24, 509–522 (2002)
Google Scholar
Canny, J.: A computational approach to edge detection. IEEE Trans. PAMI 8, 679–698 (1986)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of California, San Diego
Kai Wang & Serge Belongie

Authors

Kai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Serge Belongie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

GRASP Laboratory, University of Pennsylvania, 3330 Walnut Street, 19104, Philadelphia, PA, USA
Kostas Daniilidis
School of Electrical and Computer Engineering, National Technical University of Athens, 15773, Athens, Greece
Petros Maragos
Department of Applied Mathematics, Ecole Centrale de Paris, Grande Voie des Vignes, 92295, Chatenay-Malabry, France
Nikos Paragios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, K., Belongie, S. (2010). Word Spotting in the Wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds) Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol 6311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15549-9_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-15549-9_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15548-2
Online ISBN: 978-3-642-15549-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Word Spotting in the Wild

Abstract

Chapter PDF

Similar content being viewed by others

An Automated Pipeline for Robust Image Processing and Optical Character Recognition of Historical Documents

Out-of-Vocabulary Challenge Report

GNHK: A Dataset for English Handwriting in the Wild

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Word Spotting in the Wild

Abstract

Chapter PDF

Similar content being viewed by others

An Automated Pipeline for Robust Image Processing and Optical Character Recognition of Historical Documents

Out-of-Vocabulary Challenge Report

GNHK: A Dataset for English Handwriting in the Wild

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation