Anchor-free multi-orientation text detection in natural scene images

Abstract

Text detection in natural scene images is a key prerequisite for computer vision tasks such as image search, blind navigation, autopilot, and multi-language translation. Existing text detection methods only detect partial region of large-scale texts and are difficult to detect small-scale texts. Aiming at this problem, an anchor-free multi-orientation text detection method is proposed. Firstly, Feature Pyramid Network (FPN) is used to combine the multiple feature layers of Convolutional Neural Network (CNN) to predict the geometric properties of text, which can be used to expand the receptive field of each pixel and thus help to detect more large-scale texts. Secondly, a new loss function independent of the scale of text is designed, which enables the pixels in the small-scale text to have a larger calculation weight, thereby facilitating the detection of small-scale texts. Finally, the results of pixel-level semantic segmentation are used to filter obviously unreasonable candidate text boxes, and at the same time improve the accuracy and recall rate of text detection. The experimental results on ICDAR 2015 and MSRA-TD500 prove the good performance of our method.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. 1.

    Bouakkaz M, Ouinten Y, Loudcher S, Fournier-Viger P (2018) Efficiently mining frequent itemsets applied for textual aggregation. Appl Intell 48(4):1013–1019

    Article  Google Scholar 

  2. 2.

    Lu L, Yi Y, Huang F, Wang K, et al. (2019) Integrating Local CNN and Global CNN for Script Identification in Natural Scene Images. IEEE ACCESS 7:52669–52679

    Article  Google Scholar 

  3. 3.

    Ma J, Shao W, Ye H, et al. (2017) Arbitrary-Oriented Scene Text Detection via Rotation Proposals. IEEE Trans Multi 99:1–1

    Google Scholar 

  4. 4.

    Zhou X, Yao C, Wen H et al (2017) EAST:An Efficient and Accurate Scene Text Detector. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2642–2651

  5. 5.

    Kong T, Sun F, Liu H et al (2019) FoveaBox:Beyond Anchor-based Object Detector1904.03797

  6. 6.

    Long J, Shelhamer E, Darrell T (2014) ,Fully Convolutional Networks for Semantic Segmentation. IEEE Trans Pattern Anal Machine Intel 39(4):640–651

    Google Scholar 

  7. 7.

    Kim KH, Hong S, Roh B, et al. (2016) PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection. 1608.08021

  8. 8.

    Lin TY, Dollár P, Girshick R et al (2017) Feature Pyramid Networks for Object Detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  9. 9.

    He K, Zhang X, Ren S et al (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  10. 10.

    Karatzas D, Gomez-Bigorda L, Nicolaou A et al (2015) ICDAR 2015 competition on Robust Reading. 13th International Conference on Document Analysis and Recognition (ICDAR) 1156–1160

  11. 11.

    Yao C, Bai X, Liu W, et al. (2012) Detecting Texts of Arbitrary Orientations in Natural Images. Computer Vision and Pattern Recognition (CVPR) 1083–1090

  12. 12.

    Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Document Anal Recog (IJDAR) 2(22):143–162

    Article  Google Scholar 

  13. 13.

    Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision 770–783

  14. 14.

    Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. IEEE Conference on Computer Vision and Pattern Recognition

  15. 15.

    Yao C, Bai X, Liu W, et al. (2012) Detecting Texts of Arbitrary Orientations in Natural Images. Computer Vision and Pattern Recognition (CVPR) 1083–1090

  16. 16.

    Yao C, Zhang X, Bai X, et al. (2013) Rotation-Invariant Features for Multi-Oriented Text Detection in Natural Images. PLoS ONE 8(8):e70173

    Article  Google Scholar 

  17. 17.

    Yao C, Bai X, Liu W (2014) A Unified Framework for Multi-oriented Text Detection and Recognition. IEEE Trans Image Process 23(11):4737–4749

    MathSciNet  Article  Google Scholar 

  18. 18.

    Yin XC, Yin X, Huang K, et al. (2014) Robust Text Detection in Natural Scene Images. IEEE Trans Pattern Anal Machine Intell 36(5):970–983

    Article  Google Scholar 

  19. 19.

    Yin X, Pei W, Zhang J (2015) Multi-Orientation Scene Text Detection with Adaptive Clustering. IEEE Trans Pattern Anal Machine Intell 37(9):1–1

    Article  Google Scholar 

  20. 20.

    Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(4):1376–1405

    Article  Google Scholar 

  21. 21.

    Ye Q, Doermann DS (2015) Robust scene text detection using integrated feature discrimination. IEEE International Conference on Image Processing (ICIP) 1678–1682

  22. 22.

    Kang L, Li Y, Doermann D (2014) Orientation Robust Text Line Detection in Natural Images. IEEE Conference on Computer Vision and Pattern Recognition (CVPR

  23. 23.

    Liao M, Shi B, Bai X et al (2016) TextBoxes: A Fast Text Detector with a Single Deep Neural Network. 31st AAAI Conference on Artificial Intelligence 4161-4167

  24. 24.

    Liao M, Shi B, Bai X (2018) TextBoxes++: A Single-Shot Oriented Scene Text Detector. IEEE Trans Image Process 27(8):3676–3690

    MathSciNet  Article  Google Scholar 

  25. 25.

    Gupta A, Vedaldi A, Zisserman A (2016) Synthetic Data for Text Localisation in Natural Images. IEEE Conf Comput Vision Pattern Recog (CVPR) 2315–2324

  26. 26.

    Ma J, Shao W, Ye H, et al. (2017) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed (99) 1–1

  27. 27.

    Liu Y, Jin L (2017) Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3454–3461

  28. 28.

    Liu Z, Lin G, Yang S et al (2019) Towards Robust Curve Text Detection with Conditional Spatial Expansion. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  29. 29.

    Liao M, Zhu Z, Shi B et al (2018) Rotation-Sensitive Regression for Oriented Scene Text Detection. 31th IEEE Conference on Computer Vision and Pattern Recognition(CVPR) 5909–5918

  30. 30.

    Liu Z, Lin G, Yang S et al (2019) Towards Robust Curve Text Detection with Conditional Spatial Expansion. 32th IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  31. 31.

    Ren S, He K, Girshick R, et al. (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Machine Intell 39(6):1137–1149

    Article  Google Scholar 

  32. 32.

    Redmon J, Divvala S, Girshick R, et al. (2016) You Only Look Once: Unified, Real-Time Object Detection. 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 779–788

  33. 33.

    Tian Z, Huang W, He T, et al. (2016) Detecting Text in Natural Image with Connectionist Text Proposal Network. European Conference on Computer Vision (ECCV) 56–72

  34. 34.

    Shi B, Bai X, Belongie S (2017) Detecting Oriented Text in Natural Images by Linking Segments. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3482– 3490

  35. 35.

    He W, Zhang XY, Yin F et al (2017) Deep Direct Regression for Multi-Oriented Scene Text Detection. 16th IEEE International Conference on Computer Vision (ICCV) 745–753

  36. 36.

    Yao C, Bai X, Sang N, et al. (2016) Scene Text Detection via Holistic, Multi-Channel Prediction arXiv:1606.09002

  37. 37.

    Long S, Ruan J, Zhang W, et al. (2018) TextSnake:A Flexible Representation for Detecting Text of Arbitrary Shapes. European Conference on Computer Vision (ECCV

  38. 38.

    Deng D, Liu H, Li X, et al. (2018) PixelLink: Detecting Scene Text via Instance Segmentation. arXiv:1801.01315

  39. 39.

    Li X, Wang W, Hou W et al (2019) Shape Robust Text Detection with Progressive Scale Expansion Network. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  40. 40.

    He T, Huang W, Qiao Y et al (2016) Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network. https://arxiv.org/pdf/1603.09423.pdf

  41. 41.

    Qin S, Manduchi R (2017) Cascaded Segmentation-Detection Networks for Word-Level Text Spotting. International Conference on Document Analysis and Recognition(ICDAR) 1275–1282

  42. 42.

    Lyu P, Yao C, Wu W et al (2018) Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. 31th IEEE Conference on Computer Vision and Pattern Recognition(CVPR) 7553–7563

  43. 43.

    Liu J, Liu X, Sheng J et al (2019) Pyramid Mask Text Detector. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  44. 44.

    He K, Georgia G, Piotr D, et al. (2018) Mask R-CNN. IEEE Trans Pattern Anal Machine Intell 1–1

  45. 45.

    Milletari F, Navab N, Ahmadi SA (2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3D Vision (3DV). FourthInternational Conference on 3d vision 565–571

  46. 46.

    Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. 18th International Conference on Pattern Recognition (ICPR) 850-855

  47. 47.

    Yao C, Bai X, Liu W (2014) A unified framework for multi-oriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749

    MathSciNet  Article  Google Scholar 

  48. 48.

    Abadi M, Barham P, Chen J, et al. (2016) Tensorflow: a system for large-scale machine learning. In OSDI 16:265–283

    Google Scholar 

  49. 49.

    Zhan F, Lu S, Xue C (2018) Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes. European Conference on Computer Vision (ECCV

  50. 50.

    Zhang Z, Zhang C, Wei S, et al. (2016) Multi-Oriented Text Detection with Fully Convolutional Networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4159–4167

  51. 51.

    Wang F, Zhao L, Li X et al (2018) Geometry-Aware Scene Text Detection with Instance Transformation Network. 31th IEEE Conference on Computer Vision and Pattern Recognition(CVPR) 1381-1389

  52. 52.

    Xue C, Lu S, Zhang W (2019) MSR multi-scale shape regression for scene text detection. arXiv:1901.02596

Download references

Funding

This work was supported in part by Natural Science Foundation of Lingnan Normal University under Grants QL1307, in party by the key laboratory of Special Child Development and Education of Guangdong province, in part by National Social Science Foundation of China under Grant 61302399, in part by the Natural Science Foundation of China under Grant 61962038 and in part by the Guangxi Bagui Teams for innovation and Research.

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Faliang Huang or Yaohua Yi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lu, L., Wu, D., Wu, T. et al. Anchor-free multi-orientation text detection in natural scene images. Appl Intell (2020). https://doi.org/10.1007/s10489-020-01742-z

Download citation

Keywords

  • Text detection
  • Natural scene image
  • Anchor-free
  • Convolutional Neural Network