Three-dimensional rapid registration and reconstruction of multi-view rigid objects based on end-to-end deep surface model


Three-dimensional object reconstruction from multi-view images is an important topic in computer vision, which has attracted enormous attention during the past decades. With the further study in deep learning, remarkable progress of three-dimensional object reconstruct has been obtained in recent years. In this paper, we proposed three-dimensional rapid registration and reconstruction of multi-view rigid objects based on end-to-end deep surface model in the field of three-dimensional object reconstruction. Firstly, we introduce a matching algorithm called local stereo matching algorithm based on improved census transform and multi-scale spatial, aiming to improve the matching results for those regions. In cost aggregation step, guided map filtering algorithm with excellent gradient preserving property is introduced into Gaussian pyramid structure and regularization is added to strengthen cost volume consistency. Secondly, the improved inception RESNET module is added to improve the feature extraction ability of the network, and multiple features are extracted by using multiple network structures, and finally multiple features are sequentially input into the VRNN module to enhance the reconstruction effect of multi-view images. The experimental results show that our proposed method can not only achieve better reconstruction results, but also reconstruct more details and spend less time in training.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

    Diebel J, Thrun S (2005) An application of Markov random fields to range sensing. In: Advances in Neural Information Processing Systems, vol 24, no 05, pp 291–298

  2. 2.

    Zhuand J, Yang R (2010) Spatial–temporal fusion for high accuracy depth maps using dynamic MRFs. IEEE Trans Pattern Anal Mach Intell 32(5):899–909

    Google Scholar 

  3. 3.

    Lu J, Min D, Pahwa RS, Do MN (2011) A review to MRF-based depth map super-resolution and enhancement. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 985–988

  4. 4.

    Jessop ZM, Al-Sabah A, Gardiner MD, Combellack E, Hawkins K, Whitaker IS (2017) 3D bioprinting for reconstructive surgery: principles, applications and challenges. J Plast Reconstr Aesthet Surg 70(9):1155–1170

    Google Scholar 

  5. 5.

    Zollhöfer M, Thies J, Garrido P, Bradley D, Beeler T, Pérez P, Stamminger M, Nießner M, Theobalt C (2018) State of the art on monocular 3D face reconstruction, tracking, and applications. Comput Graph Forum 37(2):523–550

    Google Scholar 

  6. 6.

    Carr R (2013) Coachella picks RBF for 2,200-Acre La Entrada’s infrastructure. Natl Real Estate Invest Exclus Insight 2:67–68

    Google Scholar 

  7. 7.

    Penczek P, Radermacher M, Frank J (1992) Three-dimensional reconstruction of single particles embedded in ice. Ultramicroscopy 40(1):33–53

    Google Scholar 

  8. 8.

    Gilbert P (1972) Iterative methods for the three-dimensional reconstruction of an object from projections. J Theor Biol 36(1):105–117

    Google Scholar 

  9. 9.

    Amenta N, Choi S, Kolluri RK (2001) The power crust, unions of balls, and the medial axis transform. Comput Geom Theory Appl 19(2–3):127–153

    MathSciNet  MATH  Google Scholar 

  10. 10.

    Qian P, Jiang Y, Deng Z, Hu L, Sun S, Wang S, Muzic RF (2015) Cluster prototypes and fuzzy memberships jointly leveraged cross-domain maximum entropy clustering. IEEE Trans Cybern 46(1):181–193

    Google Scholar 

  11. 11.

    Qian P, Jiang Y, Wang S, Su KH, Wang J, Hu L, Muzic RF (2016) Affinity and penalty jointly constrained spectral clustering with all-compatibility, flexibility, and robustness. IEEE Trans Neural Netw Learn Syst 28(5):1123–1138

    Google Scholar 

  12. 12.

    Qian P, Zhao K, Jiang Y, Su KH, Deng Z, Wang S, Muzic RF (2017) Knowledge-leveraged transfer fuzzy C-means for texture image segmentation with self-adaptive cluster prototype matching. Knowl Based Syst 130:33–50

    Google Scholar 

  13. 13.

    Qian P, Xi C, Xu M, Jiang Y, Su KH, Wang S, Muzic RF (2018) SSC-EKE: semi-supervised classification with extensive knowledge exploitation. Inf Sci 422:51–76

    MathSciNet  MATH  Google Scholar 

  14. 14.

    Qian P, Sun S, Jiang Y, Su KH, Ni T, Wang S, Muzic RF (2016) Cross-domain, soft-partition clustering with diversity measure and knowledge reference. Pattern Recognit 50:155–177

    Google Scholar 

  15. 15.

    Qian P, Zhou J, Jiang Y, Liang F, Zhao K, Wang S, Su KH, Muzic RF (2018) Multi-view maximum entropy clustering by jointly leveraging inter-view collaborations and intra-view-weighted attributes. IEEE Access 6:28594–28610

    Google Scholar 

  16. 16.

    Qian P, Chung FL, Wang S, Deng Z (2012) Fast graph-based relaxed clustering for large data sets using minimal enclosing ball. IEEE Trans Syst Man Cybern Part B (Cybern) 42(3):672–687

    Google Scholar 

  17. 17.

    Jiang Y, Wu D, Deng Z, Qian P, Wang J, Wang G, Chung FL, Choi KS, Wang S (2017) Seizure classification from EEG signals using transfer learning, semi-supervised learning and TSK fuzzy system. IEEE Trans Neural Syst Rehabil Eng 25(12):2270–2284

    Google Scholar 

  18. 18.

    Jiang Y, Deng Z, Chung FL, Wang G, Qian P, Choi KS, Wang S (2017) Recognition of epileptic EEG signals using a novel multiview TSK fuzzy system. IEEE Trans Fuzzy Syst 25(1):3–20

    Google Scholar 

  19. 19.

    Jiang Y, Chung FL, Wang S, Deng Z, Wang J, Qian P (2014) Collaborative fuzzy clustering from multiple weighted views. IEEE Trans Cybern 45(4):688–701

    Google Scholar 

  20. 20.

    Jiang Y, Chung FL, Ishibuchi H, Deng Z, Wang S (2015) Multitask TSK fuzzy system modeling by mining intertask common hidden structure. IEEE Trans Cybern 45(3):534–547

    Google Scholar 

  21. 21.

    Murphy KP, Weiss Y, Jordan MI (1999) Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, vol 12, no 9. Morgan Kaufmann Publishers Inc., Burlington, pp 467–475

  22. 22.

    Park J, Kim H, Tai Y-W, Brown MS, Kweo I (2010) High quality depth map upsampling. In: IEEE International Conference on Computer Vision (ICCV), pp 1623–1630

  23. 23.

    Yang Q, Yang R, Davis J, Nister D (2007) Spatial-depth super resolution for range images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  24. 24.

    Chan D, Buisman H et al (2008) A noise-aware filter for real-time depth upsampling. In: Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications, pp 209–219

  25. 25.

    Furukawa Y (2010) Accurate, dense, and robust multiview stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8):1362–1376

    Google Scholar 

  26. 26.

    Dolson J, Baek J, Plagemann C, Thrun S (2010) Upsampling range data in dynamic environments. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1141–1148

  27. 27.

    Foix S, Alenya G, Torras C (2011) Lock-in time-of-flight (TOF) cameras: a survey. Sens J IEEE 11(9):1917–1926

    Google Scholar 

  28. 28.

    Harrison A, Newman P (2010) Image and sparse laser fusion for dense scene reconstruction. In: Howard A, Iagnemma K, Kelly A (eds) Field and service robotics. Springer, Berlin, pp 219–228

    Google Scholar 

  29. 29.

    Li N, Gong X, Li H et al (2018) Nonuniform multiview color texture mapping of image sequence and three-dimensional model for faded cultural relics with sift feature points. J Electron Imaging 27(1):1–21

    MathSciNet  Google Scholar 

  30. 30.

    Du C, Du C, Huang L et al (2018) Reconstructing perceived images from human brain activities with Bayesian deep multiview learning. IEEE Trans Neural Netw Learn Syst 24(24):1–14

    Google Scholar 

  31. 31.

    Yao Y, Luo Z, Li S (2018) MVSNet: depth inference for unstructured multi-view stereo. In: European Conference on Computer Vision

  32. 32.

    Schoenberg JR, Nathan A, Campbell (2012) Segmentation of dense range information in complex urban scenes. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol 21, no 4, pp 2033–2038

  33. 33.

    Vincent L, Jean Sébastien F, Edmond B (2018) Shape reconstruction using volume sweeping and learned photo consistency. In: European Conference on Computer Vision

  34. 34.

    Riegler G, Ulusoy AO, Geiger A (2017) OctNet: learning deep 3D representations at high resolutions. In: IEEE Conference on Computer Vision and Pattern Recognition

  35. 35.

    Riegler G, Ulusoy AO, Bischof H (2017) OctNet fusion: learning depth fusion from data. In: International Conference on 3D Vision

  36. 36.

    Ji M, Gall J, Zheng H (2017) SurfaceNet: an end-to-end 3D neural network for multiview stereopsis. In: IEEE International Conference on Computer Vision

  37. 37.

    Zou C, Yumer E, Yang J et al (2017) 3D-PRNN: generating shape primitives with recurrent neural networks, vol 23 no 21, pp 993–1000

  38. 38.

    Gringarten E, Deutsch CV (2001) Teacher’s aide variogram inter-pretation and modeling. Math Geol 33(4):507–534

    Google Scholar 

  39. 39.

    Rebecq H, Gallego G, Mueggler E et al (2017) EMVS: event-based multi-view stereo—3D reconstruction with an event camera in real-time. Int J Comput Vis 33(31):980–992

    Google Scholar 

  40. 40.

    Haopeng Z, Quanmao W, Zhiguo J (2017) 3D reconstruction of space objects from multi-views by a visible sensor. Sensors 17(7):1689–1698

    Google Scholar 

  41. 41.

    Laviada J, Arboleyaarboleya A, Álvarez Y et al (2017) Multiview three-dimensional reconstruction by millimetre-wave portable camera. Sci Rep 7(1):64–79

    Google Scholar 

  42. 42.

    Ebner T, Feldmann I, Renault S et al (2017) Multi-view reconstruction of dynamic real-world objects and their integration in augmented and virtual reality applications. J Soc Inf Disp 25(3):151–157

    Google Scholar 

  43. 43.

    Wang Q, Lv H, Yue J et al (2016) Supervised multiview learning based on simultaneous learning of multiview intact and single view classifier. Neural Comput Appl 16(4):61–73

    Google Scholar 

  44. 44.

    Sun L, Chen K, Song M et al (2017) Robust, efficient depth reconstruction with hierarchical confidence-based matching. IEEE Trans Image Process 26(7):3331–3343

    MathSciNet  MATH  Google Scholar 

  45. 45.

    Huang L, Chao HY, Wang CD (2018) Multi-view intact space clustering. Pattern Recognit 26(7):31–43

    Google Scholar 

  46. 46.

    Wiles O, Zisserman A (2017) SilNet: single- and multi-view reconstruction by learning from silhouettes. arXiv preprint arXiv:1711.07888

  47. 47.

    Yin Z, Zheng Y, Doerschuk PC (2001) An ab initio algorithm for low-resolution 3-D reconstructions from cryoelectron microscopy images. J Struct Biol 133(2–3):130–142

    Google Scholar 

  48. 48.

    Yu L, Fan X, Fa Z et al (2018) DLBI: deep learning guided Bayesian inference for structure reconstruction of super-resolution fluorescence microscopy. Bioinformatics 34(13):284–294

    Google Scholar 

  49. 49.

    Tang Z, Wang S, Huo J et al (2016) Bayesian framework with non-local and low-rank constraint for image reconstruction. J Phys Conf Ser 787–797:012008

    Google Scholar 

  50. 50.

    Michelangelo C, Gianvito P, Vladimir K et al (2015) Semi-supervised multi-view learning for gene network reconstruction. PLoS ONE 10(12):31–45

    Google Scholar 

  51. 51.

    Han X, Gao C, Yu Y (2017) DeepSketch2Face: a deep learning based sketching system for 3D face and caricature modeling. ACM Trans Graph 36(4):1–12

    Google Scholar 

  52. 52.

    Vodrahalli K, Bhowmik AK (2017) 3D computer vision based on machine learning with deep neural networks: a review. J Soc Inf Disp 25(11):098–103

    Google Scholar 

  53. 53.

    Raphael P, Mehrdad S, Simon J et al (2018) 3D freehand ultrasound without external tracking using deep learning. Med Image Anal 48:187–202

    Google Scholar 

  54. 54.

    Zhang J, Li K, Liang Y et al (2017) Learning 3D faces from 2D images via stacked contractive autoencoder. Neurocomputing 257:67–78

    Google Scholar 

  55. 55.

    Bai S, Zhou Z, Wang J et al (2018) Automatic ensemble diffusion for 3D shape and image retrieval. IEEE Trans Image Process 28(1): 88–101

    MathSciNet  MATH  Google Scholar 

  56. 56.

    Hansen MF, Smith ML, Smith LN et al (2018) Automated monitoring of dairy cow body condition, mobility and weight using a single 3D video capture device. Comput Ind 98:14–22

    Google Scholar 

  57. 57.

    Peng XB, Berseth G, Yin K et al (2017) DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Trans Graph 36(4):1–13

    Google Scholar 

  58. 58.

    He GW, Wang TY, Chiang AS et al (2017) Soma detection in 3D images of neurons using machine learning technique. Neuroinformatics 65(45):2081–2099

    Google Scholar 

  59. 59.

    Zhou W, Yu L, Zhou Y et al (2017) Blind quality estimator for 3D images based on binocular combination and extreme learning machine. Pattern Recognit 71:207–217

    Google Scholar 

Download references


This work was financially supported by Major project of philosophy and social science research in colleges and universities of Jiangsu province (2018SJZDA015); research foundation project of Nanjing Institute of Technology (YKJ201619); major project of philosophy and social science research in colleges and universities of 2019 Jiangsu province (2019SJZDA118).

Author information




The authors equally contributed to this research and the paper initiated by the first author. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lijun Xu.

Ethics declarations

Conflict of interest

We declare that there are no competing interests.

Availability of data and materials

Data will not be shared as the authors do not have permission to share data from the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yan, S., Xu, L. & Wang, S. Three-dimensional rapid registration and reconstruction of multi-view rigid objects based on end-to-end deep surface model. J Supercomput 76, 9010–9030 (2020).

Download citation


  • Three-dimensional reconstruction
  • Deep surface model
  • Multi-view
  • Rigid objects
  • Local stereo matching
  • Inception RESNET