Skip to main content
Log in

Sign text detection in street view images using an integrated feature

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Based on Bag of Visual Words (BoVWs) model, this paper proposes a novel method using an integrated feature to detect sign text in the street view images. BRISK features are first extracted from the street view images for dictionary learning. The Self-Growing and Self-Organized Neural Gas (SGONG) network is then used to cluster adaptively the extracted BRISK descriptors for generating visual words. The histogram of visual words is further calculated to form the appearance feature of the sign text. For eliminating the color differences and further highlighting the histogram similarity of all colors of signs, a color invariant histogram, called CIHS histogram, is presented to represent the color information of the sign text. By integrating the visual words histograms and CIHS histograms, an integrated descriptor, called Appearance and Color (A&C) descriptor, is specifically designed as the input features for cascade-Adaboost classifier. In the multi-scale sliding window text sign detection, integral image is applied to the spatial distribution map of each visual word for avoiding repeated extraction of features. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods and the detectors with the traditional descriptors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Alahi Alexandre, Ortiz Raphael, Vandergheynst Pierre (2012) Freak: fast retina keypoint. Proc Comput IEEE Conf Vision Pattern Recogn (CVPR): 510–517. doi:https://doi.org/10.1109/CVPR.2012.6247715

  2. Atsalakis A, Papamarkos N (2006) Color reduction and estimation of the number of dominant colors by using a self-growing and self-organized neural gas. Eng Appl Artif Intell 19(7):769–786

    Article  Google Scholar 

  3. Bai X, Yao C, Liu W (2016) Strokelets: a learned multi-scale mid-level representation for scene text recognition. IEEE Trans Image Process 25(6):2789–2802. https://doi.org/10.1109/TIP.2016.2555080

    Article  MathSciNet  Google Scholar 

  4. Bai X, Yao C, Liu W (2016) Strokelets: a learned multi-scale mid-level representation for scene text recognition. IEEE Transaction on Image Processing 25(6):2789–2802. https://doi.org/10.1109/TIP.2016.2555080

    Article  MathSciNet  Google Scholar 

  5. Calonder M, Lepetit V, Strecha C et al (2010) Brief: binary robust independent elementary features. Proc Eur Conf Comput Vision (ECCV) 6314:778–792. https://doi.org/10.1007/978-3-642-15561-1_56

    Article  Google Scholar 

  6. Chen Guan-Jhih, Chang I-Cheng, Yeh Hung-Yu (2017) Action segmentation based on bag-of-visual-words models. In: Proceedings of 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media). pp. 1-5

  7. Cheng W-C, Jhan D-M (2013) A self-constructing cascade classifier with AdaBoost and SVM for pedestrian detection. Eng Appl Artif Intell 26(3):1016–1028. https://doi.org/10.1016/j.engappai.2012.08.013

    Article  Google Scholar 

  8. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 2963–2970. doi:https://doi.org/10.1109/CVPR.2010.5540041

  9. Fang S, Xie H, Chen Z (2017) Detecting Uyghur text in complex background images with convolutional neural network. Multimed Tools Appl 76:15083–15103. https://doi.org/10.1007/s11042-017-4538-8

    Article  Google Scholar 

  10. Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 524–531. doi:https://doi.org/10.1109/CVPR.2005.16

  11. González Á, Bergasa LM, Javier Yebes J (2014) Text detection and recognition on traffic panels from street level imagery using visual appearance. IEEE Trans Intell Transp Syst 15(1):228–238. https://doi.org/10.1109/TITS.2013.2277662

    Article  Google Scholar 

  12. Greenhalgh J, Mirmehdi M (2015) Recognizing text-based traffic signs. IEEE Trans Intell Transp Syst 16(3):1360–1369

    Article  Google Scholar 

  13. He T, Huang W, Yu Q et al (2016) Accurate text localization in natural image with cascaded convolutional text network. ArXiv Preprint ArXiv 1603(09423):1–10

    Google Scholar 

  14. He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541

    Article  MathSciNet  Google Scholar 

  15. Jagannathan S, Desappan K, Swami P et al. (2017) Efficient object detection and classification on low power embedded systems. Proc 2017 I.E. Int Conf Consumer Electonics (ICCE): 233–234

  16. Juneja M, A. Vedaldi, C.V. Jawahar, et al. (2013) Blocks that shout: distinctive parts for scene classification. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 923–930. doi:https://doi.org/10.1109/CVPR.2013.124

  17. Karatzas Dimosthenis, Shafait Faisal, Uchida Seiichi et al. (2013) ICDAR 2013 robust reading competition. 12th Int Conf Doc Anal Recogn: 1484-1493. doi:https://doi.org/10.1109/ICDAR.2013.221

  18. Karatzas D, Gomez-Bigorda L, Nicolaou A et al. (2015) ICDAR 2015 competition on robust reading. 13th Int Conf Doc Anal Recogn (ICDAR): 1156–1160. doi:https://doi.org/10.1109/ICDAR.2015.7333942

  19. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Proc 2012 Neural Inform Process Syst (NIPS): 1097–1105

  20. Lee JJ, Lee PH, Lee SW et al (2011) Adaboost for text detection in natural scene. 2011 Int Conf Doc Anal Recogn: 429–434. doi: https://doi.org/10.1109/ICDAR.2011.93

  21. Leutenegger S, Chli M, Siegwart RY (2011) Brisk: binary robust invariant scalable keypoints. Proc IEEE Int Conf Comput Vision (ICCV): 2548–2555. doi:https://doi.org/10.1109/ICCV.2011.6126542

  22. Lim JJ, Zitnick CL, Dollár P (2013) Sketch tokens: a learned midlevel representation for contour and object detection. Proc IEEE Conf Comput Vision Pattern Recogn (CVPR): 3158–3165. doi:https://doi.org/10.1109/CVPR.2013.406

  23. Liu Z, Li Y, Qi X (2017) Method for unconstrained text detection in natural scene image. IET Comput Vis 11(7):596–604. https://doi.org/10.1049/iet-cvi.2016.0452

    Article  Google Scholar 

  24. Lu S, Chen T, Tian S, Lim JH, Tan CL (2015) Scene text extraction based on edges and support vector regression. Int J Doc Anal Recognit IJDAR 18:125–135. https://doi.org/10.1007/s10032-015-0237-z

    Article  Google Scholar 

  25. Merino-Gracia C, Lenc K, Mirmehdi M (2011) A head-mounted device for recognizing text in natural scenes. Int Workshop Camera-Based Doc Anal Recogn (IWCDAR): 29–41. doi:https://doi.org/10.1007/978-3-642-29364-1_3

    Chapter  Google Scholar 

  26. Mogelmose A, Trivedi MM, Moeslund TB (2012) Vision-based traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Trans Intell Transp Syst 13:1484–1497. https://doi.org/10.1109/TITS.2012.2209421

    Article  Google Scholar 

  27. Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. Proc ICCV:97–104. https://doi.org/10.1109/ICCV.2013.19

  28. Neycharan JG, Ahmadyfard A (2017) Edge color transform: a new operator for natural scene text localization. Multimed Tools Appl 77:7615–7636. https://doi.org/10.1007/s11042-017-4663-4

    Article  Google Scholar 

  29. Noble FK (2016) Comparison of OpenCV's feature detectors and feature matchers. Proc 23rd Int Conf Mechatron Machine Vision Pract (M2VIP): 1–6. doi:https://doi.org/10.1109/M2VIP.2016.7827292

  30. Papadopoulos DP, Kalogeiton VS, Chatzichristofis SA, Papamarkos N (2013) Automatic summarization and annotation of videos with lack of metadata information. Expert Syst Appl 40(14):5765–5778

    Article  Google Scholar 

  31. Rublee E, Rabaud V, Konolige K et al (2011) Orb: an efficient alternative to sift or surf. Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571. doi:https://doi.org/10.1109/ICCV.2011.6126544

  32. Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: reading text in scene images. 2011 Int Conf Doc Anal Recogn (ICDAR): 1491–1496. doi:https://doi.org/10.1109/ICDAR.2011.296

  33. Shivakumara P, Phan TQ, Tan CL (2011) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419. https://doi.org/10.1109/TPAMI.2010.166

    Article  Google Scholar 

  34. Stergiopoulou E, Papamarkos N (2009) Hand gesture recognition using a neural network shape fitting technique. Eng Appl Artif Intell 22(8):1141–1158

    Article  Google Scholar 

  35. Umakanthan S, Denman S, Fookes C, Sridharan S (2013) Semi-binary based video features for activity representation. In: Proceedings of 2013 international conference on digital image computing: techniques and applications (DICTA): 1–7. doi:https://doi.org/10.1109/DICTA.2013.6691527

  36. Viola P, Jones MJ, Snow D (2003) Detecting pedestrians using patterns of motion and appearance. Proceedings of Ninth IEEE International Conference On Computer Vision 2:734–741

    Article  Google Scholar 

  37. Wang Kai, Babenko Boris, Belongie S (2011) End-to-end scene text recognition. Proc 2011 Int Conf Comput Vision (ICCV): 1457–1464. doi:https://doi.org/10.1109/ICCV.2011.6126402

  38. Wang X, Wang B, Bai X et al (2013) Max-margin multiple instance dictionary learning. Proceedings of conference on machine learning(ICML): 846–854. http://dblp.unitrier.de/db/conf/icml/icml2013.html#WangWBLT13

  39. Yang J, Yu K, Gong Y et al. (2009) Linear spatial pyramid matching using sparse coding for image classification. Proceedings of IEEE conference on computer vision and pattern recognition (CVPR 2009): 1794–1801. doi:https://doi.org/10.1109/CVPR.2009.5206757

  40. Yao Cong, Bai Xiang, Liu Wenyu et al. (2012) Detecting texts of arbitrary orientations in natural images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 1083–1090

  41. Yao C, Bai X, Liu W (2014) A unified framework for multi-oriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4737. https://doi.org/10.1109/TIP.2014.2353813

    Article  MathSciNet  MATH  Google Scholar 

  42. Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500. https://doi.org/10.1109/TPAMI.2014.2366765

    Article  Google Scholar 

  43. Yin X-C, Yin X, Huang K, Hao H-W (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983

    Article  Google Scholar 

  44. Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937. https://doi.org/10.1109/TPAMI.2014.2388210

    Article  Google Scholar 

  45. Yuan J, Wei B, Liu Y et al (2015) A method for text line detection in natural images. Multimed Tools Appl 74:859–884. https://doi.org/10.1007/s11042-013-1702-7

    Article  Google Scholar 

  46. Zhao X, Lin KH, Fu Y et al (2012) Text from corners: a novel approach to detect text and caption in videos. IEEE Trans Image Process 20(3):201–205. https://doi.org/10.1109/TIP.2010.2068553

    Article  MathSciNet  MATH  Google Scholar 

  47. Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36. https://doi.org/10.1007/s11704-015-4488-0

    Article  Google Scholar 

  48. Choi S, Han SW (2014) New binary descriptors based on BRISK sampling pattern for image retrieval. In: Proceedings of 2014 International Conference on Information and Communication Technology Convergence (ICTC), pp. 575–576. doi:https://doi.org/10.1109/ICTC.2014.6983215

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.61671376 and 61671374.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, F., Yang, Y., Zhang, Hy. et al. Sign text detection in street view images using an integrated feature. Multimed Tools Appl 77, 28049–28076 (2018). https://doi.org/10.1007/s11042-018-5975-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5975-8

Keywords

Navigation