Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

  • Chao Liu
  • Yuexian ZouEmail author
  • Dongming Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11295)


Scene text detection (STD) in natural images is still challenging since text objects exhibit vast diversity in fonts, scales and orientations. Deep learning based state-of-the-art STD methods are promising such as PixelLink which has achieved 85% accuracy on ICDAR 2015 benchmark. Our preliminary experimental results with PixelLink have shown that its detection errors come mainly from two aspects: failing to detect the small scale and ambiguous text objects. In this paper, following the powerful PixelLink framework, we try to improve the STD performance via delicately designing a new fused semantic segmentation network with attention. Specifically, an inception module is carefully designed to extract multi-scale receptive field features aiming at enhancing feature representation. Besides, a hierarchical feature fusion module is cascaded with the inception module to capture multi-level inception features to obtain more semantic information. At last, to suppress background disturbance and better locate the text objects, an attention module is developed to learn a probability heat map of texts which helps accurately infer the texts even for ambiguous texts. Experimental results on three public benchmarks demonstrate the effectiveness of our proposed method compared with the state-of-the-arts. We note that the highest F-measure on ICADR 2015, ICADR 2013 and MSRA-TD500 has been obtained for our proposed method but the higher computational cost is required.


Scene text detection (STD) Semantic segmentation Hierarchical feature fusion Attention mechanism 



This paper was partially supported by the Shenzhen Science & Technology Fundamental Research Program (No.: JCYJ20160330095814461) & Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467). Special Acknowledgements are given to Aoto-PKUSZ Joint Research Center of Artificial Intelligence on Scene Cognition & Technology Innovation for its support.


  1. 1.
    Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation (2018)Google Scholar
  2. 2.
    He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE International Conference on Computer Vision, pp. 3066–3074 (2017)Google Scholar
  3. 3.
    Dai, Y., Huang, Z., Gao, Y., Chen, K.: Fused text segmentation networks for multi-oriented scene text detection (2017)Google Scholar
  4. 4.
    He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection, pp. 745–753 (2017)Google Scholar
  5. 5.
    Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). Scholar
  6. 6.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)CrossRefGoogle Scholar
  7. 7.
    Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  8. 8.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Computer Vision and Pattern Recognition, pp. 3150–3158 (2016)Google Scholar
  9. 9.
    Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015)Google Scholar
  10. 10.
    Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)Google Scholar
  11. 11.
    Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network (2016)Google Scholar
  12. 12.
    Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. 23, 4737–4749 (2014)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Nagaoka, Y., Miyazaki, T., Sugaya, Y., Omachi, S.: Text detection by faster R-CNN with multiple region proposal networks. In: IAPR International Conference on Document Analysis and Recognition, pp. 15–20 (2017)Google Scholar
  14. 14.
    Liao, M., Zhu, Z., Shi, B., Xia, G., Bai, X.: Rotation-sensitive regression for oriented scene text detection (2018)Google Scholar
  15. 15.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  16. 16.
    He, T., Huang, W., Qiao, Y., Yao, J.: Accurate text localization in natural image with cascaded convolutional text network (2016)Google Scholar
  17. 17.
    Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Computer Vision and Pattern Recognition, pp. 4159–4167 (2016)Google Scholar
  18. 18.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)Google Scholar
  19. 19.
    Szegedy, C., et al.: Going deeper with convolutions, pp. 1–9 (2014)Google Scholar
  20. 20.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2016)Google Scholar
  21. 21.
    Zhou, X., et al.: EAST: an efficient and accurate scene text detector, pp. 2642–2651 (2017)Google Scholar
  22. 22.
    Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)Google Scholar
  23. 23.
    Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition, pp. 1156–1160 (2015)Google Scholar
  24. 24.
    Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments, pp. 3482–3490 (2017)Google Scholar
  25. 25.
    Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. PP, 1 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.ADSPLAB, School of ECEPeking UniversityShenzhenChina
  2. 2.Peng Cheng LaboratoryShenzhenChina

Personalised recommendations