Detecting Text in the Wild with Deep Character Embedding Network

  • Jiaming LiEmail author
  • Chengquan Zhang
  • Yipeng Sun
  • Junyu Han
  • Errui Ding
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11364)


Most text detection methods hypothesize texts are horizontal or multi-oriented and thus define quadrangles as the basic detection unit. However, text in the wild is usually perspectively distorted or curved, which can not be easily tackled by existing approaches. In this paper, we propose a deep character embedding network (CENet) which simultaneously predicts the bounding boxes of characters and their embedding vectors, thus making text detection a simple clustering task in the character embedding space. The proposed method does not require strong assumptions of forming a straight line on general text detection, which provides flexibility on arbitrarily curved or perspectively distorted text. For character detection task, a dense prediction subnetwork is designed to obtain the confidence score and bounding boxes of characters. For character embedding task, a subnet is trained with contrastive loss to project detected characters into embedding space. The two tasks share a backbone CNN from which the multi-scale feature maps are extracted. The final text regions can be easily achieved by a thresholding process on character confidence and embedding distance of character pairs. We evaluated our method on ICDAR13, ICDAR15, MSRA-TD500, and Total Text. The proposed method achieves state-of-the-art or comparable performance on all of the datasets, and shows a substantial improvement in the irregular-text datasets, i.e. Total-Text.


Text detection Character detection Embedding learning 


  1. 1.
    Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. PAMI 36(12), 2552–2566 (2014)CrossRefGoogle Scholar
  2. 2.
    Chng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. CoRR abs/1710.10400 (2017)Google Scholar
  3. 3.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of CVPR, vol. 1, pp. 539–546. IEEE (2005)Google Scholar
  4. 4.
    Deng, D., Liu, H., Li, X., Cai, D.: PixeLlink: detecting scene text via instance segmentation. arXiv preprint arXiv:1801.01315 (2018)
  5. 5.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of CVPR, pp. 2963–2970 (2010)Google Scholar
  6. 6.
    Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of CVPR, pp. 2315–2324 (2016)Google Scholar
  7. 7.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  8. 8.
    He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: Proceedings of ICCV (2017)Google Scholar
  9. 9.
    He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural network for scene text detection. IEEE Trans. Image Process. 25(6), 2529–2541 (2016)MathSciNetCrossRefGoogle Scholar
  10. 10.
    He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of ICCV (2017)Google Scholar
  11. 11.
    Hoi, S.C., Liu, W., Lyu, M.R., Ma, W.Y.: Learning distance metrics with contextual constraints for image retrieval. In: Proceedings of CVPR, vol. 2, pp. 2072–2078. IEEE (2006)Google Scholar
  12. 12.
    Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: exploiting word annotations for character based text detection. arXiv preprint arXiv:1708.06720 (2017)
  13. 13.
    Huang, L., Yang, Y., Deng, Y., Yu, Y.: DenseBox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 (2015)
  14. 14.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). Scholar
  15. 15.
    Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of ICDAR, pp. 1156–1160. IEEE (2015)Google Scholar
  16. 16.
    Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: Proceedings of ICDAR, pp. 1484–1493. IEEE (2013)Google Scholar
  17. 17.
    Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. arXiv preprint arXiv:1707.03985 (2017)
  18. 18.
    Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of AAAI, pp. 4161–4167 (2017)Google Scholar
  19. 19.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)Google Scholar
  20. 20.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  21. 21.
    Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection. In: Proceedings of CVPR (2017)Google Scholar
  22. 22.
    Liu, Z., Lin, G., Yang, S., Feng, J., Lin, W., Ling Goh, W.: Learning Markov clustering networks for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6936–6944 (2018)Google Scholar
  23. 23.
    Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7553–7563 (2018)Google Scholar
  24. 24.
    Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. arXiv preprint arXiv:1703.01086 (2017)
  25. 25.
    Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of CVPR, pp. 3538–3545 (2012)Google Scholar
  26. 26.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  27. 27.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of CVPR, pp. 815–823 (2015)Google Scholar
  28. 28.
    Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. arXiv preprint arXiv:1703.06520 (2017)
  29. 29.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: CVPR (2016)Google Scholar
  30. 30.
    Tian, S., Lu, S., Li, C.: WeText: scene text detection under weak supervision. In: Proceedings of ICCV (2017)Google Scholar
  31. 31.
    Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Lim Tan, C.: Text flow: A unified text detection system in natural scene images. In: Proceedings of ICCV, pp. 4651–4659 (2015)Google Scholar
  32. 32.
    Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). Scholar
  33. 33.
    Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: Proceedings of ICCV, October 2017Google Scholar
  34. 34.
    Wilkinson, T., Lindstrom, J., Brun, A.: Neural ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In: Proceedings of ICCV, October 2017Google Scholar
  35. 35.
    Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1083–1090. IEEE (2012)Google Scholar
  36. 36.
    Yin, F., Liu, C.L.: Handwritten chinese text line segmentation by clustering with distance metric learning. Pattern Recogn. 42(12), 3146–3157 (2009)CrossRefGoogle Scholar
  37. 37.
    Yin, X.C., Pei, W.Y., Zhang, J., Hao, H.W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. PAMI 37(9), 1930–1937 (2015)CrossRefGoogle Scholar
  38. 38.
    Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: Proceedings of ACMMM, pp. 516–520. ACM (2016)Google Scholar
  39. 39.
    Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of CVPR (2016)Google Scholar
  40. 40.
    Zhong, Z., Jin, L., Huang, S.: DeepText: a new approach for text proposal generation and text detection in natural images. In: Proceedings of ICASSP, pp. 1208–1212 (2017)Google Scholar
  41. 41.
    Zhou, X., et al.: EAST: an efficient and accurate scene text detector. arXiv preprint arXiv:1704.03155 (2017)
  42. 42.
    Zhu, S., Zanibbi, R.: A text detection system for natural scenes with convolutional feature learning and cascaded classification. In: Proceedings of CVPR, pp. 625–632 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jiaming Li
    • 1
    Email author
  • Chengquan Zhang
    • 1
  • Yipeng Sun
    • 1
  • Junyu Han
    • 1
  • Errui Ding
    • 1
  1. 1.Baidu Inc.BeijingChina

Personalised recommendations