Sign text detection in street view images using an integrated feature

Zhao, Fan; Yang, Yao; Zhang, Hai-yan; Yang, Lin-lin; Zhang, Lin

doi:10.1007/s11042-018-5975-8

Sign text detection in street view images using an integrated feature

Published: 26 April 2018

Volume 77, pages 28049–28076, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Fan Zhao ORCID: orcid.org/0000-0001-6672-7948¹,
Yao Yang¹,
Hai-yan Zhang¹,
Lin-lin Yang¹ &
…
Lin Zhang¹

353 Accesses
4 Citations
Explore all metrics

Abstract

Based on Bag of Visual Words (BoVWs) model, this paper proposes a novel method using an integrated feature to detect sign text in the street view images. BRISK features are first extracted from the street view images for dictionary learning. The Self-Growing and Self-Organized Neural Gas (SGONG) network is then used to cluster adaptively the extracted BRISK descriptors for generating visual words. The histogram of visual words is further calculated to form the appearance feature of the sign text. For eliminating the color differences and further highlighting the histogram similarity of all colors of signs, a color invariant histogram, called CIHS histogram, is presented to represent the color information of the sign text. By integrating the visual words histograms and CIHS histograms, an integrated descriptor, called Appearance and Color (A&C) descriptor, is specifically designed as the input features for cascade-Adaboost classifier. In the multi-scale sliding window text sign detection, integral image is applied to the spatial distribution map of each visual word for avoiding repeated extraction of features. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods and the detectors with the traditional descriptors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Traffic Sign Detection Using Bag of Visual Words and Multi-scales SIFT

Recognizing Handwritten Characters with Local Descriptors and Bags of Visual Words

Automatic road sign detection and recognition based on neural network

Article 12 January 2022

Redouan Lahmyed, Mohamed El Ansari & Zakaria Kerkaou

References

Alahi Alexandre, Ortiz Raphael, Vandergheynst Pierre (2012) Freak: fast retina keypoint. Proc Comput IEEE Conf Vision Pattern Recogn (CVPR): 510–517. doi:https://doi.org/10.1109/CVPR.2012.6247715
Atsalakis A, Papamarkos N (2006) Color reduction and estimation of the number of dominant colors by using a self-growing and self-organized neural gas. Eng Appl Artif Intell 19(7):769–786
Article Google Scholar
Bai X, Yao C, Liu W (2016) Strokelets: a learned multi-scale mid-level representation for scene text recognition. IEEE Trans Image Process 25(6):2789–2802. https://doi.org/10.1109/TIP.2016.2555080
Article MathSciNet Google Scholar
Bai X, Yao C, Liu W (2016) Strokelets: a learned multi-scale mid-level representation for scene text recognition. IEEE Transaction on Image Processing 25(6):2789–2802. https://doi.org/10.1109/TIP.2016.2555080
Article MathSciNet Google Scholar
Calonder M, Lepetit V, Strecha C et al (2010) Brief: binary robust independent elementary features. Proc Eur Conf Comput Vision (ECCV) 6314:778–792. https://doi.org/10.1007/978-3-642-15561-1_56
Article Google Scholar
Chen Guan-Jhih, Chang I-Cheng, Yeh Hung-Yu (2017) Action segmentation based on bag-of-visual-words models. In: Proceedings of 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media). pp. 1-5
Cheng W-C, Jhan D-M (2013) A self-constructing cascade classifier with AdaBoost and SVM for pedestrian detection. Eng Appl Artif Intell 26(3):1016–1028. https://doi.org/10.1016/j.engappai.2012.08.013
Article Google Scholar
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 2963–2970. doi:https://doi.org/10.1109/CVPR.2010.5540041
Fang S, Xie H, Chen Z (2017) Detecting Uyghur text in complex background images with convolutional neural network. Multimed Tools Appl 76:15083–15103. https://doi.org/10.1007/s11042-017-4538-8
Article Google Scholar
Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 524–531. doi:https://doi.org/10.1109/CVPR.2005.16
González Á, Bergasa LM, Javier Yebes J (2014) Text detection and recognition on traffic panels from street level imagery using visual appearance. IEEE Trans Intell Transp Syst 15(1):228–238. https://doi.org/10.1109/TITS.2013.2277662
Article Google Scholar
Greenhalgh J, Mirmehdi M (2015) Recognizing text-based traffic signs. IEEE Trans Intell Transp Syst 16(3):1360–1369
Article Google Scholar
He T, Huang W, Yu Q et al (2016) Accurate text localization in natural image with cascaded convolutional text network. ArXiv Preprint ArXiv 1603(09423):1–10
Google Scholar
He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541
Article MathSciNet Google Scholar
Jagannathan S, Desappan K, Swami P et al. (2017) Efficient object detection and classification on low power embedded systems. Proc 2017 I.E. Int Conf Consumer Electonics (ICCE): 233–234
Juneja M, A. Vedaldi, C.V. Jawahar, et al. (2013) Blocks that shout: distinctive parts for scene classification. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 923–930. doi:https://doi.org/10.1109/CVPR.2013.124
Karatzas Dimosthenis, Shafait Faisal, Uchida Seiichi et al. (2013) ICDAR 2013 robust reading competition. 12th Int Conf Doc Anal Recogn: 1484-1493. doi:https://doi.org/10.1109/ICDAR.2013.221
Karatzas D, Gomez-Bigorda L, Nicolaou A et al. (2015) ICDAR 2015 competition on robust reading. 13th Int Conf Doc Anal Recogn (ICDAR): 1156–1160. doi:https://doi.org/10.1109/ICDAR.2015.7333942
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Proc 2012 Neural Inform Process Syst (NIPS): 1097–1105
Lee JJ, Lee PH, Lee SW et al (2011) Adaboost for text detection in natural scene. 2011 Int Conf Doc Anal Recogn: 429–434. doi: https://doi.org/10.1109/ICDAR.2011.93
Leutenegger S, Chli M, Siegwart RY (2011) Brisk: binary robust invariant scalable keypoints. Proc IEEE Int Conf Comput Vision (ICCV): 2548–2555. doi:https://doi.org/10.1109/ICCV.2011.6126542
Lim JJ, Zitnick CL, Dollár P (2013) Sketch tokens: a learned midlevel representation for contour and object detection. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 3158–3165. doi:https://doi.org/10.1109/CVPR.2013.406
Liu Z, Li Y, Qi X (2017) Method for unconstrained text detection in natural scene image. IET Comput Vis 11(7):596–604. https://doi.org/10.1049/iet-cvi.2016.0452
Article Google Scholar
Lu S, Chen T, Tian S, Lim JH, Tan CL (2015) Scene text extraction based on edges and support vector regression. Int J Doc Anal Recognit IJDAR 18:125–135. https://doi.org/10.1007/s10032-015-0237-z
Article Google Scholar
Merino-Gracia C, Lenc K, Mirmehdi M (2011) A head-mounted device for recognizing text in natural scenes. Int Workshop Camera-Based Doc Anal Recogn (IWCDAR): 29–41. doi:https://doi.org/10.1007/978-3-642-29364-1_3
Chapter Google Scholar
Mogelmose A, Trivedi MM, Moeslund TB (2012) Vision-based traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Trans Intell Transp Syst 13:1484–1497. https://doi.org/10.1109/TITS.2012.2209421
Article Google Scholar
Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. Proc ICCV:97–104. https://doi.org/10.1109/ICCV.2013.19
Neycharan JG, Ahmadyfard A (2017) Edge color transform: a new operator for natural scene text localization. Multimed Tools Appl 77:7615–7636. https://doi.org/10.1007/s11042-017-4663-4
Article Google Scholar
Noble FK (2016) Comparison of OpenCV's feature detectors and feature matchers. Proc 23rd Int Conf Mechatron Machine Vision Pract (M2VIP): 1–6. doi:https://doi.org/10.1109/M2VIP.2016.7827292
Papadopoulos DP, Kalogeiton VS, Chatzichristofis SA, Papamarkos N (2013) Automatic summarization and annotation of videos with lack of metadata information. Expert Syst Appl 40(14):5765–5778
Article Google Scholar
Rublee E, Rabaud V, Konolige K et al (2011) Orb: an efficient alternative to sift or surf. Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571. doi:https://doi.org/10.1109/ICCV.2011.6126544
Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: reading text in scene images. 2011 Int Conf Doc Anal Recogn (ICDAR): 1491–1496. doi:https://doi.org/10.1109/ICDAR.2011.296
Shivakumara P, Phan TQ, Tan CL (2011) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419. https://doi.org/10.1109/TPAMI.2010.166
Article Google Scholar
Stergiopoulou E, Papamarkos N (2009) Hand gesture recognition using a neural network shape fitting technique. Eng Appl Artif Intell 22(8):1141–1158
Article Google Scholar
Umakanthan S, Denman S, Fookes C, Sridharan S (2013) Semi-binary based video features for activity representation. In: Proceedings of 2013 international conference on digital image computing: techniques and applications (DICTA): 1–7. doi:https://doi.org/10.1109/DICTA.2013.6691527
Viola P, Jones MJ, Snow D (2003) Detecting pedestrians using patterns of motion and appearance. Proceedings of Ninth IEEE International Conference On Computer Vision 2:734–741
Article Google Scholar
Wang Kai, Babenko Boris, Belongie S (2011) End-to-end scene text recognition. Proc 2011 Int Conf Comput Vision (ICCV): 1457–1464. doi:https://doi.org/10.1109/ICCV.2011.6126402
Wang X, Wang B, Bai X et al (2013) Max-margin multiple instance dictionary learning. Proceedings of conference on machine learning(ICML): 846–854. http://dblp.unitrier.de/db/conf/icml/icml2013.html#WangWBLT13
Yang J, Yu K, Gong Y et al. (2009) Linear spatial pyramid matching using sparse coding for image classification. Proceedings of IEEE conference on computer vision and pattern recognition (CVPR 2009): 1794–1801. doi:https://doi.org/10.1109/CVPR.2009.5206757
Yao Cong, Bai Xiang, Liu Wenyu et al. (2012) Detecting texts of arbitrary orientations in natural images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 1083–1090
Yao C, Bai X, Liu W (2014) A unified framework for multi-oriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4737. https://doi.org/10.1109/TIP.2014.2353813
Article MathSciNet MATH Google Scholar
Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500. https://doi.org/10.1109/TPAMI.2014.2366765
Article Google Scholar
Yin X-C, Yin X, Huang K, Hao H-W (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Article Google Scholar
Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937. https://doi.org/10.1109/TPAMI.2014.2388210
Article Google Scholar
Yuan J, Wei B, Liu Y et al (2015) A method for text line detection in natural images. Multimed Tools Appl 74:859–884. https://doi.org/10.1007/s11042-013-1702-7
Article Google Scholar
Zhao X, Lin KH, Fu Y et al (2012) Text from corners: a novel approach to detect text and caption in videos. IEEE Trans Image Process 20(3):201–205. https://doi.org/10.1109/TIP.2010.2068553
Article MathSciNet MATH Google Scholar
Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36. https://doi.org/10.1007/s11704-015-4488-0
Article Google Scholar
Choi S, Han SW (2014) New binary descriptors based on BRISK sampling pattern for image retrieval. In: Proceedings of 2014 International Conference on Information and Communication Technology Convergence (ICTC), pp. 575–576. doi:https://doi.org/10.1109/ICTC.2014.6983215

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.61671376 and 61671374.

Author information

Authors and Affiliations

Department of Information Science, Faculty of Printing, Packaging Engineering and Digital Media Technology, Xi’an University of Technology, Xi’an, Shaanxi, 710048, People’s Republic of China
Fan Zhao, Yao Yang, Hai-yan Zhang, Lin-lin Yang & Lin Zhang

Authors

Fan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hai-yan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lin-lin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fan Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, F., Yang, Y., Zhang, Hy. et al. Sign text detection in street view images using an integrated feature. Multimed Tools Appl 77, 28049–28076 (2018). https://doi.org/10.1007/s11042-018-5975-8

Download citation

Received: 01 November 2017
Revised: 22 March 2018
Accepted: 04 April 2018
Published: 26 April 2018
Issue Date: November 2018
DOI: https://doi.org/10.1007/s11042-018-5975-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sign text detection in street view images using an integrated feature

Abstract

Access this article

Similar content being viewed by others

Efficient Traffic Sign Detection Using Bag of Visual Words and Multi-scales SIFT

Recognizing Handwritten Characters with Local Descriptors and Bags of Visual Words

Automatic road sign detection and recognition based on neural network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sign text detection in street view images using an integrated feature

Abstract

Access this article

Similar content being viewed by others

Efficient Traffic Sign Detection Using Bag of Visual Words and Multi-scales SIFT

Recognizing Handwritten Characters with Local Descriptors and Bags of Visual Words

Automatic road sign detection and recognition based on neural network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation