Detecting ground control points via convolutional neural network for stereo matching


In this paper, we present a novel approach to detect ground control points (GCPs) for stereo matching problem. First of all, we train a convolutional neural network (CNN) on a large stereo set, and compute the matching confidence of each pixel by using the trained CNN model. Secondly, we present a ground control points selection scheme according to the maximum matching confidence of each pixel. Finally, the selected GCPs are used to refine the matching costs, then we apply the new matching costs to perform optimization with semi-global matching algorithm for improving the final disparity maps. We evaluate our approach on the KITTI 2012 stereo benchmark dataset. Our experiments show that the proposed approach significantly improves the accuracy of disparity maps.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J (2015) NetVLAD: CNN architecture for weakly supervised place recognition. arXiv:1511.07247

  2. 2.

    Bobick AF, Intille SS (1999) Large occlusion stereo. IJCV 33(3):181–200

    Article  Google Scholar 

  3. 3.

    Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. TPAMI 23(11):1222–1239

    Article  Google Scholar 

  4. 4.

    Chen Z, Sun X, Wang L, Yu Y, Huang C (2015) A deep visual correspondence embedding model for stereo matching costs. In: ICCV, pp 972–980

  5. 5.

    Freeman WT, Pasztor EC, Carmichael OT (2000) Learning low-level vision. IJCV 40(1):25–47

    Article  MATH  Google Scholar 

  6. 6.

    Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res:0278364913491297

  7. 7.

    Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 1440–1448)

  8. 8.

    Haeusler R, Nair R, Kondermann D (2013) Ensemble learning for confidence measures in stereo vision. In: CVPR. IEEE, pp 305–312

  9. 9.

    Hermann S, Klette R (2013) Iterative semi-global matching for robust driver assistance systems. In: ACCV. Springer, pp 465–478

  10. 10.

    Hirschmüller H (2008) Stereo processing by semiglobal matching and mutual information. TPAMI 30(2):328–341

    Article  Google Scholar 

  11. 11.

    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678

  12. 12.

    Kong D, Tao H (2004) A method for learning matching errors for stereo computation. In: BMVC, vol 1, p 2

  13. 13.

    Kong D, Tao H (2006) Stereo matching via learning multiple experts behaviors. In: BMVC, vol 1, p 2

  14. 14.

    Lew MS, Huang TS, Wong K (1994) Learning and feature selection in stereo matching. TPAMI 16(9):869–881

    Article  Google Scholar 

  15. 15.

    Li W, Chen Y, Lee J, Ren G, Cosker D (2016) Blur robust optical flow using motion channel. arXiv:1603.02253

  16. 16.

    Li W, Cosker D (2016) Video interpolation using optical flow and laplacian smoothness. Neurocomputing

  17. 17.

    Li W, Cosker D, Brown M, Tang R (2013) Optical flow estimation using laplacian mesh energy. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2435–2442

  18. 18.

    Li W, Cosker D, Zhihan L, Brown M (2016) Nonrigid optical flow ground truth for real-world scenes with time-varying shading effects. IEEE robotics and automation letters 2(11):231–238

    Article  Google Scholar 

  19. 19.

    Liang Z, Zhi B, Yifan S, Jingdong W, Shengjin W, Chi S, Qi T (2016) Mars: a video benchmark for large-scale person re-identification. In: European conference on computer vision. Springer

  20. 20.

    Motten A, Claesen L, Pan Y (2012) Trinocular disparity processor using a hierarchic classification structure. In: IEEE/IFIP 20th international conference on VLSI And system-on-chip (VLSI-SoC), 2012. IEEE, pp 247–250

  21. 21.

    Park MG, Yoon KJ (2015) Leveraging stereo matching with learning-based confidence measures. In: CVPR, pp 101–109

  22. 22.

    Peris M, Maki A, Martull S, Ohkawa Y, Fukui K (2012) Towards a simulation driven stereo vision system. In: ICPR. IEEE, pp 1038–1042

  23. 23.

    Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47(1-3):7–42

    Article  MATH  Google Scholar 

  24. 24.

    Spangenberg R, Langner T, Rojas R (2013) Weighted semi-global matching and center-symmetric census transform for robust driver assistance. In: Computer analysis of images and patterns. Springer, pp 34–41

  25. 25.

    Spyropoulos A, Komodakis N, Mordohai P (2014) Learning to detect ground control points for improving the accuracy of stereo matching. In: CVPR. IEEE, pp 1621–1628

  26. 26.

    Sun J, Zheng NN, Shum HY (2003) Stereo matching using belief propagation. TPAMI 25(7):787– 800

    Article  MATH  Google Scholar 

  27. 27.

    Vedula S, Baker S, Rander P, Collins R, Kanade T (1999) Three-dimensional scene flow. In: The proceedings of the seventh IEEE international conference on computer vision, 1999, vol 2. IEEE, pp 722–729

  28. 28.

    Yamaguchi K, McAllester D, Urtasun R (2013) Robust monocular epipolar flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1862–1869

  29. 29.

    Yamaguchi K, McAllester D, Urtasun R (2014) Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: European conference on computer vision. Springer, pp 756–771

  30. 30.

    Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. CVPR

  31. 31.

    žbontar J, LeCun Y (2015) Computing the stereo matching cost with a convolutional neural network. CVPR

  32. 32.

    Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17:1–32

    MATH  Google Scholar 

  33. 33.

    Zheng L, Wang S, Tian L, He F, Liu Z, Tian Q (2015) Query-adaptive late fusion for image search and person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1741–1750

  34. 34.

    Zheng L, Zhang H, Sun S et al (2016) Person re-identification in the wild. arXiv:1604.02531

  35. 35.

    Zhong Z, Lei M, Li S, Fan J (2016) Re-ranking object proposals for object detection in automatic driving. arXiv:1605.05904

Download references


We thank Wenjing Li for helpful discussions and encouragement. This work is supported by the Nature Science Foundation of China (No.61202143, No.61572409), the Natural Science Foundation of Fujian Province (No.2013J05100) and Fujian Provi-nce 2011 Collaborative Innovation Center of TCM Health Management.

Author information



Corresponding author

Correspondence to Shaozi Li.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhong, Z., Su, S., Cao, D. et al. Detecting ground control points via convolutional neural network for stereo matching. Multimed Tools Appl 76, 18473–18488 (2017).

Download citation


  • Stereo matching
  • CNN
  • Ground control points
  • Matching confidence