Skip to main content

A Text Recognition and Retrieval System for e-Business Image Management

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10705))

Included in the following conference series:

Abstract

The on-going growth of e-business has resulted in companies having to manage an ever increasing number of product, packaging and promotional images. Systems for indexing and retrieving such images are required in order to ensure image libraries can be managed and fully exploited as valuable business resources. In this paper, we explore the power of text recognition for e-business image management and propose an innovative system based on photo OCR. Photo OCR has been actively studied for scene text recognition but has not been exploited for e-business digital image management. Besides the well known difficulties in scene text recognition such as various size, location, orientation in text and cluttered background, e-business images typically feature text with extremely diverse fonts, and the characters are often artistically modified in shape, colour and arrangement. To address these challenges, our system takes advantage of the combinatorial power of deep neural networks and MSER processing. The cosine distance and n-gram vectors are used during retrieval for matching detected text to queries to provide tolerance to the inevitable transcription errors in text recognition. To evaluate our proposed system, we prepared a novel dataset designed specifically to reflect the challenges associated with text in e-business images. We compared our system with two other approaches for scene text recognition, and the results show our system outperforms other state-of-the-art on the new challenging dataset. Our system demonstrates that recognizing text embedded in images can be hugely beneficial for digital asset management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The dataset is available for download from https://github.com/jiang-public/mmm2018.

  2. 2.

    https://github.com/subokita/Robust-Text-Detection.

  3. 3.

    http://docs.opencv.org/3.0-beta/modules/text/doc/text.html.

References

  1. Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: Proceedings of the 2013 IEEE International Conference on Computer Vision. ICCV 2013, pp. 785–792. IEEE Computer Society, Washington, DC (2013)

    Google Scholar 

  2. Bušta, M., Neumann, L., Matas, J.: FASText: efficient unconstrained scene text detector. In: 2015 IEEE International Conference on Computer Vision (ICCV 2015), pp. 1206–1214. IEEE, California, December 2015

    Google Scholar 

  3. Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 IEEE International Conference on Image Processing. Brussels, September 2011

    Google Scholar 

  4. Deselaers, T., Keysers, D., Ney, H.: Features for image retrieval: an experimental comparison. Inf. Retrieval 11(2), 77–107 (2008)

    Article  Google Scholar 

  5. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970. IEEE (2010)

    Google Scholar 

  6. Forssén, P.E.: Maximally stable colour regions for recognition and matching. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, IEEE, Minneapolis, June 2007

    Google Scholar 

  7. Gómez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, pp. 467–471. ICDAR 2013. IEEE Computer Society, Washington, DC (2013)

    Google Scholar 

  8. He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural network for scene text detection. Trans. Img. Proc. 25(6), 2529–2541 (2016)

    Article  MathSciNet  Google Scholar 

  9. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: NIPS Deep Learning Workshop (2014)

    Google Scholar 

  10. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)

    Article  MathSciNet  Google Scholar 

  11. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34

    Google Scholar 

  12. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S.K., Bagdanov, A.D., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160. IEEE Computer Society (2015). Relocated from Tunis, Tunisia

    Google Scholar 

  13. Koo, H.I., Kim, D.H.: Scene text detection via connected component clustering and nontext filtering. IEEE Trans. Image Process. 22(6), 2296–2305 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2(1), 1–19 (2006)

    Article  Google Scholar 

  15. Li, Y., Lu, H.: Scene text detection via stroke width. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 681–684, November 2012

    Google Scholar 

  16. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, pp. 36.1–36.10. BMVA Press (2002). https://doi.org/10.5244/C.16.36

  17. Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: 2013 IEEE International Conference on Computer Vision (ICCV 2013), pp. 97–104. IEEE, California, December 2013

    Google Scholar 

  18. Neumann, L., Matas, J.: Efficient scene text localization and recognition with local character refinement. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 746–750. IEEE, California, August 2015

    Google Scholar 

  19. Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2016)

    Article  Google Scholar 

  20. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. CoRR abs/1507.05717 (2015)

    Google Scholar 

  21. Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02, pp. 629–633. ICDAR 2007. IEEE Computer Society, Washington, DC (2007)

    Google Scholar 

  22. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464, November 2011

    Google Scholar 

Download references

Acknowledgements

This research has been supported by Science Foundation Ireland under grant number SFI/12/RC/2289 which is co-funded by the European Regional Development Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiang Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, J., McGuinness, K., O’Connor, N.E. (2018). A Text Recognition and Retrieval System for e-Business Image Management. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73600-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73599-3

  • Online ISBN: 978-3-319-73600-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics