Advertisement

Scene text detection with fully convolutional neural networks

  • Zhandong Liu
  • Wengang Zhou
  • Houqiang LiEmail author
Article
  • 30 Downloads

Abstract

Text detection in scene image has become a hot topic in computer vision and artificial intelligence research, due to its wide range of applications and challenges. Most state-of-the-art methods for text detection based on deep learning rely on text bounding box regression. These methods can not well handle the case that if the scene text is curved. In this paper, we propose a new framework for arbitrarily oriented text detection in natural images based on fully convolutional neural networks. The main idea is to represent a text instance by two forms: text center block and word stroke region. These two elements are detected by two fully convolutional networks, respectively. Final detections are produced by the word region surrounding box algorithm. The proposed method does not need to regress the extant bounding box of the text instance, mainly because the predicted text block region itself implicitly contains position and orientation information. Besides, our method can well handle text in different languages, arbitrary orientations, curved shape and various fonts. To validate the effectiveness of the proposed method, we perform experiments on three public datasets: MSRA-TD500, USTB-SV1K and ICDAR2013, and compare it with other state-of-the-art methods. Experiment results demonstrate that the proposed method achieves competitive results. Based on VGG-16, our method achieves an F-measure of 78.84% on MSRA-TD500, 59.34% on USTB-SV1K, and 88.21% on ICDAR2013.

Keywords

Scene text detection Semantic segmentation Multi-orientation Word stroke region Text center block 

Notes

Acknowledgements

This work is supported in part to Zhandong Liu by Natural Science Foundation of China under contract No. 61662082 and No. U1703261, and in part to Dr. Wengang Zhou by Natural Science Foundation of China under contract No. 61632019, the Fundamental Research Funds for the Central Universities, and Young Elite Scientists Sponsorship Program By CAST (No. 2016QNRC001), and in part to Dr. Houqiang Li by Natural Science Foundation of China under contract No. 61836011 and No. 61390514.

References

  1. 1.
    Bai X, Yang M, Lyu P, Xu Y (2017) Integrating scene text and visual appearance for fine-grained image classification with convolutional neural networks. arXiv:1704.04613
  2. 2.
    Busta M, Neumann L, Matas J (2015) Fastext: efficient unconstrained scene text detector. In: IEEE international conference on computer vision, pp 1206–1214Google Scholar
  3. 3.
    Cho H, Sung M, Jun B (2016) Canny text detector: fast and robust scene text localization algorithm. In: IEEE conference on computer vision and pattern recognition, pp 3566–3573Google Scholar
  4. 4.
    Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. arXiv:1801.01315
  5. 5.
    Eom S, Huh JH (2018) Group signature with restrictive linkability: minimizing privacy exposure in ubiquitous environment. J Ambient Intell Humaniz Comput: 1–11Google Scholar
  6. 6.
    Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE conference on computer vision and pattern recognition, pp 2963–2970Google Scholar
  7. 7.
    He D, Yang X, Liang C, Zhou Z, Ororbia AG, Kifer D, Giles CL (2017) Multi-scale fcn with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: IEEE conference on computer vision and pattern recognition, pp 3519–3528Google Scholar
  8. 8.
    He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: IEEE international conference on computer vision, pp 3047–3055Google Scholar
  9. 9.
    He T, Huang W, Qiao Y, Yao J (2016) Accurate text localization in natural image with cascaded convolutional text network. arXiv:1603.09423
  10. 10.
    Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection. arXiv:1509.04874
  11. 11.
    Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE international conference on computer vision, pp 1241–1248Google Scholar
  12. 12.
    Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision, pp 497–511Google Scholar
  13. 13.
    Huh JH, Otgonchimeg S, Seo K (2016) Advanced metering infrastructure design and test bed experiment using intelligent agents: focusing on the plc network base technology for smart grid system. J Supercomput 72(5):1862–1877CrossRefGoogle Scholar
  14. 14.
    Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, pp 512–528Google Scholar
  15. 15.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM international conference on multimedia, pp 675–678Google Scholar
  16. 16.
    Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv:1706.09579
  17. 17.
    Kang L, Li Y, Doermann D (2014) Orientation robust text line detection in natural images. In: IEEE conference on computer vision and pattern recognition, pp 4034–4041Google Scholar
  18. 18.
    Karaoglu S, Tao R, Gevers T, Smeulders AW (2017) Words matter: scene text for image classification and retrieval. IEEE Trans Multimedia 19(5):1063–1076CrossRefGoogle Scholar
  19. 19.
    Khare V, Shivakumara P, Paramesran R, Blumenstein M (2017) Arbitrarily-oriented multi-lingual text detection in video. Multimed Tools Appl 76 (15):16625–16655CrossRefGoogle Scholar
  20. 20.
    Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: AAAI conference on artificial intelligence, pp 4161–4167Google Scholar
  21. 21.
    Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: IEEE international conference on computer vision, pp 2980–2988Google Scholar
  22. 22.
    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision, pp 21–37Google Scholar
  23. 23.
    Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) Fots: fast oriented text spotting with a unified network. arXiv:1801.01671
  24. 24.
    Liu Y, Jin L (2017) Deep matching prior network: toward tighter multi-oriented text detection. In: IEEE conference on computer vision and pattern recognition, vol 2, p 8Google Scholar
  25. 25.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 3431–3440Google Scholar
  26. 26.
    Loukhaoukha K, Chouinard JY, Berdai A (2012) A secure image encryption algorithm based on rubik’s cube principle. J Electr Comput Eng 2012:7MathSciNetzbMATHGoogle Scholar
  27. 27.
    Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767CrossRefGoogle Scholar
  28. 28.
    Nikoloudakis Y, Panagiotakis S, Markakis E, Pallis E, Mastorakis G, Mavromoustakis CX, Dobre C (2016) A fog-based emergency system for smart enhanced living environments. IEEE Cloud Computing 3(6): 54–62Google Scholar
  29. 29.
    Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99Google Scholar
  30. 30.
    Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: IEEE conference on computer vision and pattern recognition, pp 2550–2558Google Scholar
  31. 31.
    Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304CrossRefGoogle Scholar
  32. 32.
    Tian S, Pan Y, Huang C, Lu S, Yu K, Lim Tan C (2015) Text flow: a unified text detection system in natural scene images. In: IEEE international conference on computer vision, pp 4651–4659Google Scholar
  33. 33.
    Tian S, Pei WY, Zuo ZY, Yin X (2016) Scene text detection in video by learning locally and globally. In: International joint conference on artificial intelligence, pp 2647–2653Google Scholar
  34. 34.
    Wolf C, Jolion JM (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int J Doc Anal Recognit 8(4):280–296CrossRefGoogle Scholar
  35. 35.
    Xie S, Tu Z (2015) Holistically-nested edge detection. In: IEEE international conference on computer vision, pp 1395–1403Google Scholar
  36. 36.
    Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: IEEE conference on computer vision and pattern recognition, pp 1083–1090Google Scholar
  37. 37.
    Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. arXiv:1606.09002
  38. 38.
    Yi C, Tian Y (2011) Assistive text reading from complex background for blind persons. In: International workshop on camera-based document analysis and recognition, pp 15–28Google Scholar
  39. 39.
    Yin X, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930–1937CrossRefGoogle Scholar
  40. 40.
    Yin X, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983CrossRefGoogle Scholar
  41. 41.
    Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: IEEE conference on computer vision and pattern recognition, pp 2558–2567Google Scholar
  42. 42.
    Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: IEEE conference on computer vision and pattern recognition, pp 4159–4167Google Scholar
  43. 43.
    Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: IEEE conference on computer vision and pattern recognition, pp 2642–2651Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Department of Electronic Engineering and Information ScienceUniversity of Science and Technology of ChinaHefeiChina

Personalised recommendations