Detecting Text in the Wild with Deep Character Embedding Network

Li, Jiaming; Zhang, Chengquan; Sun, Yipeng; Han, Junyu; Ding, Errui

doi:10.1007/978-3-030-20870-7_31

Detecting Text in the Wild with Deep Character Embedding Network

Jiaming Li¹²,
Chengquan Zhang¹²,
Yipeng Sun¹²,
Junyu Han¹² &
…
Errui Ding¹²

Conference paper
First Online: 25 May 2019

1798 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11364))

Abstract

Most text detection methods hypothesize texts are horizontal or multi-oriented and thus define quadrangles as the basic detection unit. However, text in the wild is usually perspectively distorted or curved, which can not be easily tackled by existing approaches. In this paper, we propose a deep character embedding network (CENet) which simultaneously predicts the bounding boxes of characters and their embedding vectors, thus making text detection a simple clustering task in the character embedding space. The proposed method does not require strong assumptions of forming a straight line on general text detection, which provides flexibility on arbitrarily curved or perspectively distorted text. For character detection task, a dense prediction subnetwork is designed to obtain the confidence score and bounding boxes of characters. For character embedding task, a subnet is trained with contrastive loss to project detected characters into embedding space. The two tasks share a backbone CNN from which the multi-scale feature maps are extracted. The final text regions can be easily achieved by a thresholding process on character confidence and embedding distance of character pairs. We evaluated our method on ICDAR13, ICDAR15, MSRA-TD500, and Total Text. The proposed method achieves state-of-the-art or comparable performance on all of the datasets, and shows a substantial improvement in the irregular-text datasets, i.e. Total-Text.

J. Liu and C. Zhang—These authors contribute equally in this work.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. PAMI 36(12), 2552–2566 (2014)
Article Google Scholar
Chng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. CoRR abs/1710.10400 (2017)
Google Scholar
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of CVPR, vol. 1, pp. 539–546. IEEE (2005)
Google Scholar
Deng, D., Liu, H., Li, X., Cai, D.: PixeLlink: detecting scene text via instance segmentation. arXiv preprint arXiv:1801.01315 (2018)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of CVPR, pp. 2963–2970 (2010)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of CVPR, pp. 2315–2324 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: Proceedings of ICCV (2017)
Google Scholar
He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural network for scene text detection. IEEE Trans. Image Process. 25(6), 2529–2541 (2016)
Article MathSciNet Google Scholar
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of ICCV (2017)
Google Scholar
Hoi, S.C., Liu, W., Lyu, M.R., Ma, W.Y.: Learning distance metrics with contextual constraints for image retrieval. In: Proceedings of CVPR, vol. 2, pp. 2072–2078. IEEE (2006)
Google Scholar
Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: exploiting word annotations for character based text detection. arXiv preprint arXiv:1708.06720 (2017)
Huang, L., Yang, Y., Deng, Y., Yu, Y.: DenseBox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 (2015)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Chapter Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of ICDAR, pp. 1156–1160. IEEE (2015)
Google Scholar
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: Proceedings of ICDAR, pp. 1484–1493. IEEE (2013)
Google Scholar
Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. arXiv preprint arXiv:1707.03985 (2017)
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of AAAI, pp. 4161–4167 (2017)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection. In: Proceedings of CVPR (2017)
Google Scholar
Liu, Z., Lin, G., Yang, S., Feng, J., Lin, W., Ling Goh, W.: Learning Markov clustering networks for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6936–6944 (2018)
Google Scholar
Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7553–7563 (2018)
Google Scholar
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. arXiv preprint arXiv:1703.01086 (2017)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of CVPR, pp. 3538–3545 (2012)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of CVPR, pp. 815–823 (2015)
Google Scholar
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. arXiv preprint arXiv:1703.06520 (2017)
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: CVPR (2016)
Google Scholar
Tian, S., Lu, S., Li, C.: WeText: scene text detection under weak supervision. In: Proceedings of ICCV (2017)
Google Scholar
Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Lim Tan, C.: Text flow: A unified text detection system in natural scene images. In: Proceedings of ICCV, pp. 4651–4659 (2015)
Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: Proceedings of ICCV, October 2017
Google Scholar
Wilkinson, T., Lindstrom, J., Brun, A.: Neural ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In: Proceedings of ICCV, October 2017
Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1083–1090. IEEE (2012)
Google Scholar
Yin, F., Liu, C.L.: Handwritten chinese text line segmentation by clustering with distance metric learning. Pattern Recogn. 42(12), 3146–3157 (2009)
Article Google Scholar
Yin, X.C., Pei, W.Y., Zhang, J., Hao, H.W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. PAMI 37(9), 1930–1937 (2015)
Article Google Scholar
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: Proceedings of ACMMM, pp. 516–520. ACM (2016)
Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of CVPR (2016)
Google Scholar
Zhong, Z., Jin, L., Huang, S.: DeepText: a new approach for text proposal generation and text detection in natural images. In: Proceedings of ICASSP, pp. 1208–1212 (2017)
Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. arXiv preprint arXiv:1704.03155 (2017)
Zhu, S., Zanibbi, R.: A text detection system for natural scenes with convolutional feature learning and cascaded classification. In: Proceedings of CVPR, pp. 625–632 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Baidu Inc., Beijing, China
Jiaming Li, Chengquan Zhang, Yipeng Sun, Junyu Han & Errui Ding

Authors

Jiaming Li
View author publications
You can also search for this author in PubMed Google Scholar
Chengquan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yipeng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Han
View author publications
You can also search for this author in PubMed Google Scholar
Errui Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiaming Li .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C.V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Zhang, C., Sun, Y., Han, J., Ding, E. (2019). Detecting Text in the Wild with Deep Character Embedding Network. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11364. Springer, Cham. https://doi.org/10.1007/978-3-030-20870-7_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-20870-7_31
Published: 25 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20869-1
Online ISBN: 978-3-030-20870-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics