Multimedia Tools and Applications

, Volume 78, Issue 10, pp 13263–13278 | Cite as

Real-time small traffic sign detection with revised faster-RCNN

  • Cen Han
  • Guangyu GaoEmail author
  • Yu Zhang


Traffic sign detection is a crucial step for automatic driving and Intelligent Transportation. Promising results have been achieved in the area of traffic sign detection, but most of them are limited to ideal environment, where the traffic signs are very clear and large. Actually, traffic sign detection is always realized based on object detection methods. However, existing object detection methods failed to detect most of the traffic signs, especially in surveillance videos or driving recorder videos. In fact, traffic signs, i.e. traffic lights, or distant road signs in driving recorded video, always cover less than 5% of the whole image in the view of camera. Therefore, in this paper, we dedicate an effort to propose a real-time small traffic sign detection approach based on revised Faster-RCNN. More specifically, firstly, we use a small region proposal generator to extract the characteristics of small traffic signs. That is to say, considering that the stride of generator is too large, we remove the pool4 layer of VGG-16 and adopt dilation for ResNet. Secondly, we combine the revised architecture of Faster-RCNN with Online Hard Examples Mining (OHEM) to make the system more robust to locate the region of small traffic signs. Finally, we conduct extensive experiments and empirical evaluations on several different videos to demonstrate the satisfying performance of our approach. i.e., the experimental results show our approach improve the mean average precision by 12.1% over the original object detection algorithm.


Traffic signs Small object detection RCNN OHEM 



This work was supported by the National Natural Science Foundation of China under Grant NO. 61401023.


  1. 1.
    Bartlett W (1997) Mel. Seemore: combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural comput 9(4):777–804CrossRefGoogle Scholar
  2. 2.
    Baro X, Escalera S, Vitria J, Pujol O, Radeva P (2009) Traffic sign recognition using evolutionary adaboost detection and forest-ecoc classification. IEEE Trans Intell Transp Syst 10(1):113–126CrossRefGoogle Scholar
  3. 3.
    Belongie S, Malik J, Puzicha J (2001) Matching shapes. In: IEEE International conference on computer vision, pp 454–461Google Scholar
  4. 4.
    Chen C, Liu MY, Tuzel O, Xiao J (2016) R-cnn for small object detection. In: Asian conference on computer vision, pp 214–230Google Scholar
  5. 5.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on, pp 886– 893Google Scholar
  6. 6.
    De A, la E, Moreno LE, Salichs MA, Armingol JM (1997) Road traffic sign detection and classification. IEEE Trans Ind Electron 44(6):848–859CrossRefGoogle Scholar
  7. 7.
    Felzenszwalb Pedro, McAllester David, Ramanan Deva (2008) A discriminatively trained, multiscale, deformable part model. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE conference on, pp 1–8. IEEEGoogle Scholar
  8. 8.
    Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1134–1142Google Scholar
  9. 9.
    Girshick R (2015) Fast r-cnn. In: IEEE International conference on computer vision, pp 1440–1448Google Scholar
  10. 10.
    Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. Computer Science, pp 580–587Google Scholar
  11. 11.
    Greenhalgh J, Mirmehdi M (2012) Real-time detection and recognition of road traffic signs. IEEE Trans Intell Transp Syst 13(4):1498–1506CrossRefGoogle Scholar
  12. 12.
    He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 346–361Google Scholar
  13. 13.
    Khan FS, Anwer RM, Weijer JVd, Bagdanov AD, Vanrell M, Lopez AM (2012) Color attributes for object detection. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on, pp 3306–3313. IEEEGoogle Scholar
  14. 14.
    Li Y, He K, Sun J et al (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387Google Scholar
  15. 15.
    Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, C Lawrence Z (2014) Microsoft coco Common objects in context. In: European conference on computer vision, pp 740–755Google Scholar
  16. 16.
    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37Google Scholar
  17. 17.
    Loshchilov I, Hutter F (2015) Online batch selection for faster training of neural networks MathematicsGoogle Scholar
  18. 18.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  19. 19.
    Qian R, Zhang B, Yue Y, Wang Z, Coenen F (2016) Robust chinese traffic sign detection and recognition with deep convolutional neural network. In: International conference on natural computation, pp 791–796Google Scholar
  20. 20.
    Redmon J, Divvala S, Girshick R, once AF (2016) You only look Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779– 788Google Scholar
  21. 21.
    Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence 39(6): 1137CrossRefGoogle Scholar
  22. 22.
    Rui Hu, Barnard M, Collomosse J (2010) Gradient field descriptor for sketch based retrieval and localization. In: IEEE International conference on image processing, pp 1025–1028Google Scholar
  23. 23.
    Sang J, Changsheng Xu, Liu J (2012) User-aware image tag refinement via ternary semantic analysis. IEEE Transactions on Multimedia 14(3):883–895CrossRefGoogle Scholar
  24. 24.
    Sang J, Xu C (2012) Right buddy makes the difference An early exploration of social relation analysis in multimedia applications. In: Proceedings of the 20th ACM international conference on Multimedia, pp 19–28. ACMGoogle Scholar
  25. 25.
    Sang J, Fang Q, Xu C (2017) Exploiting social-mobile information for location visualization. ACM Transactions on Intelligent Systems and Technology (TIST) 8(3):39Google Scholar
  26. 26.
    Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Overfeat YL (2013) Integrated recognition, localization and detection using convolutional networks. Eprint ArxivGoogle Scholar
  27. 27.
    Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: IEEE Conference on computer vision and pattern recognition, pp 761–769Google Scholar
  28. 28.
    Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Moreno-Noguer F (2014) Fracking deep convolutional image descriptors, arXiv:1412.6537.2
  29. 29.
    Stallkamp J, Schlipsing M, Salmen J, Igel C (2012) Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Networks the Official Journal of the International Neural Network Society 32(2):323–332CrossRefGoogle Scholar
  30. 30.
    Takeki A, Tu TT, Yoshihashi R, Kawakami R, Iida M, Naemura T (2016) Detection of small birds in large images by combining a deep detector with semantic segmentation. In: IEEE International conference on image processing, pp 3977–3981Google Scholar
  31. 31.
    Uijlings JR, Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171CrossRefGoogle Scholar
  32. 32.
    Wang X, Gupta A (2015) Unsupervised learning of visual representations using videos pp 2794–2802Google Scholar
  33. 33.
    Xie K, Ge S, Ye Q, Luo Z (2016) Traffic sign recognition based on attribute-refinement cascaded convolutional neural networks. In: Pacific rim conference on multimedia, pp 201–210Google Scholar
  34. 34.
    Yang B, Yan J, Lei Z, Li SZ (2014) Aggregate channel features for multi-view face detection. In: Biometrics (IJCB), 2014 IEEE international joint conference on, pp 1–8Google Scholar
  35. 35.
    Zhao WL, Ngo CW (2013) Flip-invariant sift for copy and object detection. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 22(3):980–91MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Zhu Y, Zhang C, Zhou D, Wang X, Bai X, Liu W (2016) Traffic sign detection and recognition using fully convolutional network guided proposals. Neurocomputing 214:758–766CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyBeijing Institute of TechnologyBeijingChina

Personalised recommendations