The Visual Computer

, Volume 35, Issue 3, pp 399–414 | Cite as

Cascading classifier with discriminative multi-features for a specific 3D object real-time detection

  • Rui WangEmail author
  • Ying Liang
  • Jing Wen Xu
  • Zhi Hai He
Original Article


Real-time specific 3D object detection plays an important role in intelligent service robots and intelligent surveillance fields. Compared to most existing approaches, which use simple template-matching methods, we present a novel discriminative learning-based method referred to as B-CST (BING - Colour + Shape + Texture) to detect a specific 3D object from a video in real time. Instead of the sliding-window technique, an original candidate extraction strategy is proposed, and that a new cascade classifier for recognition is also developed. In the candidate extraction stage, the rapid and high-quality objectness measure, binarised normed gradients, is modified to highlight the target candidate regions as well as to suppress undesirable background regions. In the recognition stage, each candidate region is then verified and further classified into different categories, which are denoted as positive, including multi-view images of target, or negative. The designed cascade classifiers conduct the recognition with discriminative multiple features, i.e. the novel dominant colour histogram, the histogram of oriented gradients and the original Gabor-CS-LTP feature, which is the centre-symmetric local ternary pattern of a special Gabor magnitude mapping. We evaluate our proposed method on our challenging new dataset consisting of 5 objects and two well-known public datasets and then compare it with other detection techniques for a single 3D object. A comparative study shows that our B-CST method is efficient in both high-quality detection results and detection speed, which can achieve the real-time processing requirements of video sequences (approximately 23 fps).


Specific 3D object detection Candidate extraction Candidate region recognition Discriminative multi-features Cascaded classifiers 



The authors thank the anonymous reviewers for their assistance. This work was supported by a grant from the National Natural Science Foundation of China (61673039).

Supplementary material

371_2018_1472_MOESM1_ESM.avi (47 mb)
Supplementary material 1 (avi 48127 KB)
371_2018_1472_MOESM2_ESM.avi (95.7 mb)
Supplementary material 2 (avi 98031 KB)


  1. 1.
    Pepik, B., et al.: Multi-view and 3D deformable part models. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1–1 (2015)CrossRefGoogle Scholar
  2. 2.
    Rui, W., et al.: Robust tracking via monocular active vision for an intelligent teaching system. Vis. Comput. 32, 1–16 (2016)CrossRefGoogle Scholar
  3. 3.
    Rui, W., Ying, L.: Real-time 3D object detection in unstructured environments. In: 2016 International Conference on Information and Systems (2016)Google Scholar
  4. 4.
    Kuo H.Y., et al.: 3D object detection and pose estimation from depth image for robotic bin picking. In: IEEE International Conference on Automation Science and Engineering IEEE, pp 1264–1269 (2014)Google Scholar
  5. 5.
    Tang, Y., Tong, R., Tang, M., Zhang, Y.: Depth incorporating with color improves salient object detection. Vis. Comput. 32(1), 111121 (2016)CrossRefGoogle Scholar
  6. 6.
    Guo, Y. Wang, F., Xin, J.: Point-wise saliency detection on 3D point clouds via covariance descriptors. Vis. Comput. (2017).
  7. 7.
    Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: IEEE Conference on Computer Vision and Pattern Recognition IEEE, pp. 778–785 (2009)Google Scholar
  8. 8.
    Darom, T., Keller, Y.: Scale-invariant features for 3-D mesh models. IEEE Trans. Image Process. 21, 2758–2769 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Guo, Y., Sohel, F., Bennamoun, M., Lu, M., Wan, J.: Rotational projection statistics for 3D local surface description and object recognition. Int. J. Comput. Vision 105, 63–86 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J., Kwok, N.M.: A comprehensive performance evaluation of 3D local feature descriptors. Int. J. Comput. Vis. 116, 66–89 (2015)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Sohel, F., et al.: 3D object recognition in cluttered scenes with local surface features: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2270–2287 (2014)CrossRefGoogle Scholar
  12. 12.
    Tejani, A., et al.: Latent-class hough forests for 3D object detection and pose estimation. Comput. Vis. ECCV 2014, 462–477 (2014)Google Scholar
  13. 13.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition IEEE Computer Society, pp. 886–893 (2005)Google Scholar
  14. 14.
    Li, T., Ye, M., Ding, J.: Discriminative Hough context model for object detection. Vis. Comput. 30(1), 5969 (2013)Google Scholar
  15. 15.
    Rios-Cabrera, R., Tuytelaars, T.: Boosting masked dominant orientation templates for efficient object detection. Comput. Vis. Image Underst. 120, 103–106 (2014)CrossRefGoogle Scholar
  16. 16.
    Cortes, C., Vapnik, V.: Support-vector network. Mach. Learn. 20(3), 273297 (1995)Google Scholar
  17. 17.
    Strum, P., et al.: Grdient response maps for real-time detection of textureless objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 876–888 (2012)CrossRefGoogle Scholar
  18. 18.
    Rios-Cabrera R, Tuytelaars T.: Discriminatively trained templates for 3D object detection: a real time scalable approach. In: IEEE international conference on computer vision, pp. 2048–2055 (2013)Google Scholar
  19. 19.
    Zhang, C., Viola, P.: Multiple-instance pruning for learning efficient cascade detectors. US 8,010,471 B2 (2011)Google Scholar
  20. 20.
    Yang, H., Wang, X.A.: Cascade classifier for face detection. J. Algorithm Comput. Technol. 10(3), 187–197 (2016)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Feng, C., Bin, Y., Jianhua, D.:Pedestrian detection algorithm with feature and cascade classifier. J. Hefei Univ. Technol. (Natl. Sci.). pp. 1456–1461 (2014)Google Scholar
  22. 22.
    Cheng, M.M., et al.: BING: binarized normed gradients for objectness estimation at 300fps. In: IEEE Conference on Computer Vision and Pattern Recognition IEEE Computer Society, pp. 3286–3293 (2014)Google Scholar
  23. 23.
    Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete algorithms, Society for Industrial and Applied Mathematics 11, 1027–1035 (2015)Google Scholar
  24. 24.
    Rousseeuw, P.J.: Silhouettes.: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(20), 53–65 (1987)CrossRefzbMATHGoogle Scholar
  25. 25.
  26. 26.
    Liu, W., Wang, Z.: Facial expression recognition based on fusion of multiple gabor features. In: 18th International Conference on Pattern Recognition, 2006. ICPR 2006. vol. 3, pp. 536–539 (2006)Google Scholar
  27. 27.
    Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002)CrossRefzbMATHGoogle Scholar
  28. 28.
    Xiaoyang, T., Triggs, B.: Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans. Image Process. 19(6), 1635–1650 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Zheng, Y., Shen, C. et al.: Effective pedestrian detection using center-symmetric local binary/trinary patterns. Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June (2010) arXiv:1009.0892
  30. 30.
    Liu, L., Lao, S., Fieguth, P.W., Guo, Y., Wang, X., Pietikinen, M.: Median robust extended local binary pattern for texture. IEEE Int. Conf. Image Process. 25, 2319–2323 (2015)Google Scholar
  31. 31.
    Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRefGoogle Scholar
  32. 32.
    Lai, K., et al.: A large-scale hierarchical multi-view RGB-D object dataset. In: IEEE International Conference on Robotics and Automation, ICRA 2011, Shanghai, China, 9–13 May DBLP, pp. 1817–1824 (2011)Google Scholar
  33. 33.
    Damen, D., et al.: Real-time learning and detection of 3D texture-less objects: a scalable approach. In: BMVC (2012)Google Scholar
  34. 34.
    Bay, H., Tuytelaars, T., Gool, L.V.: Surfspeeded up robust features. Comput. Vis. Image Underst. 110(3), 404–417 (2006)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Rui Wang
    • 1
    Email author
  • Ying Liang
    • 1
  • Jing Wen Xu
    • 1
  • Zhi Hai He
    • 2
  1. 1.School of Instrumentation Science and Opto-Electronics Engineering, Laboratory of Precision Opto-Mechatronics TechnologyBeihang UniversityHaidian District, BeijingChina
  2. 2.Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaUSA

Personalised recommendations