Abstract
The on-going growth of e-business has resulted in companies having to manage an ever increasing number of product, packaging and promotional images. Systems for indexing and retrieving such images are required in order to ensure image libraries can be managed and fully exploited as valuable business resources. In this paper, we explore the power of text recognition for e-business image management and propose an innovative system based on photo OCR. Photo OCR has been actively studied for scene text recognition but has not been exploited for e-business digital image management. Besides the well known difficulties in scene text recognition such as various size, location, orientation in text and cluttered background, e-business images typically feature text with extremely diverse fonts, and the characters are often artistically modified in shape, colour and arrangement. To address these challenges, our system takes advantage of the combinatorial power of deep neural networks and MSER processing. The cosine distance and n-gram vectors are used during retrieval for matching detected text to queries to provide tolerance to the inevitable transcription errors in text recognition. To evaluate our proposed system, we prepared a novel dataset designed specifically to reflect the challenges associated with text in e-business images. We compared our system with two other approaches for scene text recognition, and the results show our system outperforms other state-of-the-art on the new challenging dataset. Our system demonstrates that recognizing text embedded in images can be hugely beneficial for digital asset management.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The dataset is available for download from https://github.com/jiang-public/mmm2018.
- 2.
- 3.
References
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: Proceedings of the 2013 IEEE International Conference on Computer Vision. ICCV 2013, pp. 785–792. IEEE Computer Society, Washington, DC (2013)
Bušta, M., Neumann, L., Matas, J.: FASText: efficient unconstrained scene text detector. In: 2015 IEEE International Conference on Computer Vision (ICCV 2015), pp. 1206–1214. IEEE, California, December 2015
Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 IEEE International Conference on Image Processing. Brussels, September 2011
Deselaers, T., Keysers, D., Ney, H.: Features for image retrieval: an experimental comparison. Inf. Retrieval 11(2), 77–107 (2008)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970. IEEE (2010)
Forssén, P.E.: Maximally stable colour regions for recognition and matching. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, IEEE, Minneapolis, June 2007
Gómez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, pp. 467–471. ICDAR 2013. IEEE Computer Society, Washington, DC (2013)
He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural network for scene text detection. Trans. Img. Proc. 25(6), 2529–2541 (2016)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: NIPS Deep Learning Workshop (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S.K., Bagdanov, A.D., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160. IEEE Computer Society (2015). Relocated from Tunis, Tunisia
Koo, H.I., Kim, D.H.: Scene text detection via connected component clustering and nontext filtering. IEEE Trans. Image Process. 22(6), 2296–2305 (2013)
Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2(1), 1–19 (2006)
Li, Y., Lu, H.: Scene text detection via stroke width. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 681–684, November 2012
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, pp. 36.1–36.10. BMVA Press (2002). https://doi.org/10.5244/C.16.36
Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: 2013 IEEE International Conference on Computer Vision (ICCV 2013), pp. 97–104. IEEE, California, December 2013
Neumann, L., Matas, J.: Efficient scene text localization and recognition with local character refinement. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 746–750. IEEE, California, August 2015
Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2016)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. CoRR abs/1507.05717 (2015)
Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02, pp. 629–633. ICDAR 2007. IEEE Computer Society, Washington, DC (2007)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464, November 2011
Acknowledgements
This research has been supported by Science Foundation Ireland under grant number SFI/12/RC/2289 which is co-funded by the European Regional Development Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Zhou, J., McGuinness, K., O’Connor, N.E. (2018). A Text Recognition and Retrieval System for e-Business Image Management. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-73600-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73599-3
Online ISBN: 978-3-319-73600-6
eBook Packages: Computer ScienceComputer Science (R0)