Multimedia Tools and Applications

, Volume 76, Issue 18, pp 18473–18488 | Cite as

Detecting ground control points via convolutional neural network for stereo matching

  • Zhun Zhong
  • Songzhi Su
  • Donglin Cao
  • Shaozi LiEmail author
  • Zhihan Lv


In this paper, we present a novel approach to detect ground control points (GCPs) for stereo matching problem. First of all, we train a convolutional neural network (CNN) on a large stereo set, and compute the matching confidence of each pixel by using the trained CNN model. Secondly, we present a ground control points selection scheme according to the maximum matching confidence of each pixel. Finally, the selected GCPs are used to refine the matching costs, then we apply the new matching costs to perform optimization with semi-global matching algorithm for improving the final disparity maps. We evaluate our approach on the KITTI 2012 stereo benchmark dataset. Our experiments show that the proposed approach significantly improves the accuracy of disparity maps.


Stereo matching CNN Ground control points Matching confidence 



We thank Wenjing Li for helpful discussions and encouragement. This work is supported by the Nature Science Foundation of China (No.61202143, No.61572409), the Natural Science Foundation of Fujian Province (No.2013J05100) and Fujian Provi-nce 2011 Collaborative Innovation Center of TCM Health Management.


  1. 1.
    Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J (2015) NetVLAD: CNN architecture for weakly supervised place recognition. arXiv:1511.07247
  2. 2.
    Bobick AF, Intille SS (1999) Large occlusion stereo. IJCV 33(3):181–200CrossRefGoogle Scholar
  3. 3.
    Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. TPAMI 23(11):1222–1239CrossRefGoogle Scholar
  4. 4.
    Chen Z, Sun X, Wang L, Yu Y, Huang C (2015) A deep visual correspondence embedding model for stereo matching costs. In: ICCV, pp 972–980Google Scholar
  5. 5.
    Freeman WT, Pasztor EC, Carmichael OT (2000) Learning low-level vision. IJCV 40(1):25–47CrossRefzbMATHGoogle Scholar
  6. 6.
    Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res:0278364913491297Google Scholar
  7. 7.
    Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 1440–1448)Google Scholar
  8. 8.
    Haeusler R, Nair R, Kondermann D (2013) Ensemble learning for confidence measures in stereo vision. In: CVPR. IEEE, pp 305–312Google Scholar
  9. 9.
    Hermann S, Klette R (2013) Iterative semi-global matching for robust driver assistance systems. In: ACCV. Springer, pp 465–478Google Scholar
  10. 10.
    Hirschmüller H (2008) Stereo processing by semiglobal matching and mutual information. TPAMI 30(2):328–341CrossRefGoogle Scholar
  11. 11.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678Google Scholar
  12. 12.
    Kong D, Tao H (2004) A method for learning matching errors for stereo computation. In: BMVC, vol 1, p 2Google Scholar
  13. 13.
    Kong D, Tao H (2006) Stereo matching via learning multiple experts behaviors. In: BMVC, vol 1, p 2Google Scholar
  14. 14.
    Lew MS, Huang TS, Wong K (1994) Learning and feature selection in stereo matching. TPAMI 16(9):869–881CrossRefGoogle Scholar
  15. 15.
    Li W, Chen Y, Lee J, Ren G, Cosker D (2016) Blur robust optical flow using motion channel. arXiv:1603.02253
  16. 16.
    Li W, Cosker D (2016) Video interpolation using optical flow and laplacian smoothness. NeurocomputingGoogle Scholar
  17. 17.
    Li W, Cosker D, Brown M, Tang R (2013) Optical flow estimation using laplacian mesh energy. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2435–2442Google Scholar
  18. 18.
    Li W, Cosker D, Zhihan L, Brown M (2016) Nonrigid optical flow ground truth for real-world scenes with time-varying shading effects. IEEE robotics and automation letters 2(11):231–238CrossRefGoogle Scholar
  19. 19.
    Liang Z, Zhi B, Yifan S, Jingdong W, Shengjin W, Chi S, Qi T (2016) Mars: a video benchmark for large-scale person re-identification. In: European conference on computer vision. SpringerGoogle Scholar
  20. 20.
    Motten A, Claesen L, Pan Y (2012) Trinocular disparity processor using a hierarchic classification structure. In: IEEE/IFIP 20th international conference on VLSI And system-on-chip (VLSI-SoC), 2012. IEEE, pp 247–250Google Scholar
  21. 21.
    Park MG, Yoon KJ (2015) Leveraging stereo matching with learning-based confidence measures. In: CVPR, pp 101–109Google Scholar
  22. 22.
    Peris M, Maki A, Martull S, Ohkawa Y, Fukui K (2012) Towards a simulation driven stereo vision system. In: ICPR. IEEE, pp 1038–1042Google Scholar
  23. 23.
    Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47(1-3):7–42CrossRefzbMATHGoogle Scholar
  24. 24.
    Spangenberg R, Langner T, Rojas R (2013) Weighted semi-global matching and center-symmetric census transform for robust driver assistance. In: Computer analysis of images and patterns. Springer, pp 34–41Google Scholar
  25. 25.
    Spyropoulos A, Komodakis N, Mordohai P (2014) Learning to detect ground control points for improving the accuracy of stereo matching. In: CVPR. IEEE, pp 1621–1628Google Scholar
  26. 26.
    Sun J, Zheng NN, Shum HY (2003) Stereo matching using belief propagation. TPAMI 25(7):787– 800CrossRefzbMATHGoogle Scholar
  27. 27.
    Vedula S, Baker S, Rander P, Collins R, Kanade T (1999) Three-dimensional scene flow. In: The proceedings of the seventh IEEE international conference on computer vision, 1999, vol 2. IEEE, pp 722–729Google Scholar
  28. 28.
    Yamaguchi K, McAllester D, Urtasun R (2013) Robust monocular epipolar flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1862–1869Google Scholar
  29. 29.
    Yamaguchi K, McAllester D, Urtasun R (2014) Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: European conference on computer vision. Springer, pp 756–771Google Scholar
  30. 30.
    Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. CVPRGoogle Scholar
  31. 31.
    žbontar J, LeCun Y (2015) Computing the stereo matching cost with a convolutional neural network. CVPRGoogle Scholar
  32. 32.
    Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17:1–32zbMATHGoogle Scholar
  33. 33.
    Zheng L, Wang S, Tian L, He F, Liu Z, Tian Q (2015) Query-adaptive late fusion for image search and person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1741–1750Google Scholar
  34. 34.
    Zheng L, Zhang H, Sun S et al (2016) Person re-identification in the wild. arXiv:1604.02531
  35. 35.
    Zhong Z, Lei M, Li S, Fan J (2016) Re-ranking object proposals for object detection in automatic driving. arXiv:1605.05904

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Zhun Zhong
    • 1
  • Songzhi Su
    • 1
  • Donglin Cao
    • 1
  • Shaozi Li
    • 1
    Email author
  • Zhihan Lv
    • 2
  1. 1.Cognitive Science DepartmentXiamen UniversityXiamenChina
  2. 2.SIAT, Chinese Academy of ScienceShenzhenChina

Personalised recommendations