Abstract
Monocular dense 3D reconstruction of deformable objects is a hard ill-posed problem in computer vision. Current techniques either require dense correspondences and rely on motion and deformation cues, or assume a highly accurate reconstruction (referred to as a template) of at least a single frame given in advance and operate in the manner of non-rigid tracking. Accurate computation of dense point tracks often requires multiple frames and might be computationally expensive. Availability of a template is a very strong prior which restricts system operation to a pre-defined environment and scenarios. In this work, we propose a new hybrid approach for monocular non-rigid reconstruction which we call Hybrid Deformation Model Network (HDM-Net). In our approach, a deformation model is learned by a deep neural network, with a combination of domain-specific loss functions. We train the network with multiple states of a non-rigidly deforming structure with a known shape at rest. HDM-Net learns different reconstruction cues including texture-dependent surface deformations, shading and contours. We show generalisability of HDM-Net to states not presented in the training dataset, with unseen textures and under new illumination conditions. Experiments with noisy data and a comparison with other methods demonstrate the robustness and accuracy of the proposed approach and suggest possible application scenarios of the new technique in interventional diagnostics and augmented reality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The dataset is available upon request.
- 2.
When executed in a batch of 100 frames with \(73^2\) points each, a C++ version of [71] takes 1.47 ms per frame on our hardware; for 400 frames long batch, it requires 5.27 ms per frame.
References
Agudo, A., Agapito, L., Calvo, B., Montiel, J.M.M.: Good vibrations: a modal analysis approach for sequential non-rigid structure from motion. In: Computer Vision and Pattern Recognition (CVPR), pp. 1558–1565 (2014)
Agudo, A., Moreno-Noguer, F.: Force-based representation for non-rigid shape and elastic model estimation. Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(9), 2137–2150 (2018)
Agudo, A., Moreno-Noguer, F.: A scalable, efficient, and accurate solution to non-rigid structure from motion. Comput. Vis. Image Underst. (CVIU), 167, 121–133 (2018)
Agudo, A., Moreno-Noguer, F., Calvo, B., Montiel, J.M.M.: Sequential non-rigid structure from motion using physical priors. Trans. Pattern Anal. Mach. Intell. (TPAMI) 38, 979–994 (2016)
Akhter, I., Sheikh, Y., Khan, S., Kanade, T.: Trajectory space: a dual representation for nonrigid structure from motion. Trans. Pattern Anal. Mach. Intell. (TPAMI) 33(7), 1442–1456 (2011)
Ansari, M., Golyanik, V., Stricker, D.: Scalable dense monocular surface reconstruction. In: International Conference on 3D Vision (3DV) (2017)
Birkbeck, N., Cobza, D., Jägersand, M.: Basis constrained 3D scene flow on a dynamic proxy. In: International Conference on Computer Vision (ICCV), pp. 1967–1974 (2011)
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. ACM Trans. Graph. (TOG) 187–194 (1999)
Brand, M.: A direct method for 3D factorization of nonrigid motion observed in 2D. In: Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 122–128 (2005)
Bregler, C., Hertzmann, A., Biermann, H.: Recovering non-rigid 3D shape from image streams. In: Computer Vision and Pattern Recognition (CVPR), pp. 690–696 (2000)
Brunet, F., Hartley, R., Bartoli, A., Navab, N., Malgouyres, R.: Monocular template-based reconstruction of smooth and inextensible surfaces. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 52–66. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_5
Del Bue, A.: A factorization approach to structure from motion with shape priors. In: Computer Vision and Pattern Recognition (CVPR) (2008)
Chhatkuli, A., Pizarro, D., Collins, T., Bartoli, A.: Inextensible non-rigid structure-from-motion by second-order cone programming. Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(1), 2428–2441 (2018)
Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38
Cohen, L.D., Cohen, I.: Deformable models for 3-D medical images using finite elements and balloons. In: Computer Vision and Pattern Recognition (CVPR), pp. 592–598 (1992)
Cook, R.L., Torrance, K.E.: A reflectance model for computer graphics. ACM Trans. Graph. (TOG) 1(1), 7–24 (1982)
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. ACM Trans. Graph. (TOG) 303–312 (1996)
Dai, Y., Li, H., He, M.: A simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. 107(2), 101–122 (2014)
Dou, P., Shah, S.K., Kakadiaris, I.A.: End-to-end 3D face reconstruction with deep neural networks. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems (NIPS), pp. 2366–2374 (2014)
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Fayad, J., Agapito, L., Del Bue, A.: Piecewise quadratic reconstruction of non-rigid surfaces from monocular sequences. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 297–310. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_22
Blender Foundation: blender, v. 2.79a. open source 3d creation (2018). https://www.blender.org/
Gallardo, M., Collins, T., Bartoli, A.: Using shading and a 3D template to reconstruct complex surface deformations. In: British Machine Vision Conference (BMVC) (2016)
Gallardo, M., Collins, T., Bartoli, A.: Dense non-rigid structure-from-motion and shading with unknown albedos. In: International Conference on Computer Vision (ICCV) (2017)
Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of non-rigid surfaces from monocular video. In: Computer Vision and Pattern Recognition (CVPR), pp. 1272–1279 (2013)
Garg, R., Kumar, V.B.G., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Garrido, P., et al.: Reconstruction of personalized 3D face rigs from monocular video 35(3), 28:1–28:15 (2016)
Giannarou, S., Visentini-Scarzanella, M., Yang, G.Z.: Probabilistic tracking of affine-invariant anisotropic regions. Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(1), 130–143 (2013)
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Golyanik, V., Fetzer, T., Stricker, D.: Accurate 3D reconstruction of dynamic scenes from monocular image sequences with severe occlusions. In: Winter Conference on Applications of Computer Vision (WACV) (2017)
Golyanik, V., Mathur, A.S., Stricker, D.: NRSFM-flow: recovering non-rigid scene flow from monocular image sequences. In: British Machine Vision Conference (BMVC) (2016)
Golyanik, V., Stricker, D.: Dense batch non-rigid structure from motion in a second. In: Winter Conference on Applications of Computer Vision (WACV), pp. 254–263 (2017)
Gotardo, P.F.U., Martinez, A.M.: Non-rigid structure from motion with complementary rank-3 spaces. In: Computer Vision and Pattern Recognition (CVPR), pp. 3065–3072 (2011)
Guan, P., Weiss, A., Blan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: International Conference on Computer Vision (ICCV), pp. 1381–1388 (2009)
Gumerov, N., Zandifar, A., Duraiswami, R., Davis, L.S.: Structure of applicable surfaces from single views. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 482–496. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24672-5_38
Hamsici, O.C., Gotardo, P.F.U., Martinez, A.M.: Learning spatially-smooth mappings in non-rigid structure from motion. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 260–273. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_19
Haouchine, N., Dequidt, J., Berger, M.O., Cotin, S.: Single view augmentation of 3D elastic objects. In: International Symposium on Mixed and Augmented Reality (ISMAR), pp. 229–236 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR) (2016)
Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In: International Conference on Computer Vision (ICCV) (2017)
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2017–2025 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Lee, M., Cho, J., Oh, S.: Procrustean normal distribution for non-rigid structure from motion. Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(7), 1388–1400 (2017)
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2015)
Liu-Yin, Q., Yu, R., Agapito, L., Fitzgibbon, A., Russell, C.: Better together: joint reasoning for non-rigid 3D reconstruction with specularities and shading. In: British Machine Vision Conference (BMVC) (2016)
Malti, A., Hartley, R., Bartoli, A., Kim, J.H.: Monocular template-based 3D reconstruction of extensible surfaces with local linear elasticity. In: Computer Vision and Pattern Recognition (CVPR), pp. 1522–1529 (2013)
McInerney, T., Terzopoulos, D.: A finite element model for 3D shape reconstruction and nonrigid motion tracking. In: International Conference on Computer Vision (ICCV), pp. 518–523 (1993)
Mitiche, A., Mathlouthi, Y., Ben Ayed, I.: Monocular concurrent recovery of structure and motion scene flow. Front. ICT 2, 16 (2015)
Moreno-Noguer, F., Porta, J.M., Fua, P.: Exploring ambiguities for monocular non-rigid shape estimation. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 370–383. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15558-1_27
NVIDIA Corporation: NVIDIA CUDA C programming guide (2018). Version 9.0
Paladini, M., Del Bue, A., Xavier, J., Agapito, L., Stosić, M., Dodig, M.: Optimal metric projections for deformable and articulated structure-from-motion. Int. J. Comput. Vis. (IJCV) 96(2), 252–276 (2012)
Paladini, M., Bartoli, A., Agapito, L.: Sequential non-rigid structure-from-motion with the 3D-implicit low-rank shape model. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 15–28. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_2
Paszke, A., et al.: Automatic differentiation in pytorch. In: Advances in Neural Information Processing Systems Workshops (NIPS-W) (2017)
Paszke, A., Gross, S., Massa, F., Chintala, S.: pytorch (2018). https://github.com/pytorch
Perriollat, M., Hartley, R., Bartoli, A.: Monocular template-based reconstruction of inextensible surfaces. Int. J. Comput. Vis. (IJCV) 95(2), 124–137 (2011)
Pumarola, A., Agudo, A., Porzi, L., Sanfeliu, A., Lepetit, V., Moreno-Noguer, F.: Geometry-aware network for non-rigid shape prediction from a single view. In: Computer Vision and Pattern Recognition (CVPR), pp. 4681–4690 (2018)
Riegler, G., Ulusoy, A.O., Bischof, H., Geiger, A.: OctNetFusion: learning depth fusion from data. In: International Conference on 3D Vision (3DV) (2017)
Russell, C., Fayad, J., Agapito, L.: Dense non-rigid structure from motion. In: International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), pp. 509–516 (2012)
Salzmann, M., Fua, P.: Reconstructing sharply folding surfaces: a convex formulation. In: Computer Vision and Pattern Recognition (CVPR), pp. 1054–1061 (2009)
Salzmann, M., Fua, P.: Linear local models for monocular reconstruction of deformable surfaces. Trans. Pattern Anal. Mach. Intell. (TPAMI) 33(5), 931–944 (2011)
Salzmann, M., Hartley, R., Fua, P.: Convex optimization for deformable surface 3-D tracking. In: International Conference on Computer Vision (ICCV) (2007)
Salzmann, M., Lepetit, V., Fua, P.: Deformable surface tracking ambiguities. In: Computer Vision and Pattern Recognition (CVPR) (2007)
Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation. In: International Conference on Computer Vision (ICCV) (2017)
Stay & Play Rotorua Ltd: A hot balloon. http://stayandplaynz.com/rotorua/the-real-new-zealand-experience/. Accessed 29 June 2018
Suwajanakorn, S., Kemelmacher-Shlizerman, I., Seitz, S.M.: Total moving face reconstruction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 796–812. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_52
Taetz, B., Bleser, G., Golyanik, V., Stricker, D.: Occlusion-aware video registration for highly non-rigid objects. In: Winter Conference on Applications of Computer Vision (WACV) (2016)
Tao, L., Matuszewski, B.J.: Non-rigid structure from motion with diffusion maps prior. In: Computer Vision and Pattern Recognition (CVPR), pp. 1530–1537 (2013)
Tateno, K., Tombari, F., Laina, I., Navab, N.: CNN-SLAM: real-time dense monocular slam with learned depth prediction. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Tewari, A., et al.: Mofa: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: International Conference on Computer Vision (ICCV) (2017)
Textures.com: WrincklesHanging0037. https://www.textures.com/browse/hanging/112398. Accessed 29 June 2018
Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. Int. J. Comput. Vis. (IJCV) 9, 137–154 (1992)
Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Torresani, L., Hertzmann, A., Bregler, C.: Nonrigid structure-from-motion: estimating shape and motion with hierarchical priors. Trans. Pattern Anal. Mach. Intell. (TPAMI) 30(5), 878–892 (2008)
Varol, A., Shaji, A., Salzmann, M., Fua, P.: Monocular 3D reconstruction of locally textured surfaces. Trans. Pattern Anal. Mach. Intell. (TPAMI) 34(6), 1118–1130 (2012)
Vicente, S., Agapito, L.: Soft inextensibility constraints for template-free non-rigid reconstruction. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 426–440. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_31
Wandt, B., Ackermann, H., Rosenhahn, B.: 3D reconstruction of human motion from monocular image sequences. Trans. Pattern Anal. Mach. Intell. (TPAMI) 38(8), 1505–1516 (2016)
White, R., Forsyth, D.A.: Combining cues: shape from shading and texture. In: Computer Vision and Pattern Recognition (CVPR), pp. 1809–1816 (2006)
Xiao, D., Yang, Q., Yang, B., Wei, W.: Monocular scene flow estimation via variational method. Multimedia Tools Appl. 76(8), 10575–10597 (2017)
Xiao, J., Chai, J., Kanade, T.: A closed-form solution to non-rigid shape and motion recovery. Int. J. Comput. Vis. (IJCV) 67(2), 233–246 (2006)
Yu, R., Russell, C., Campbell, N.D.F., Agapito, L.: Direct, dense, and deformable: template-based non-rigid 3D reconstruction from RGB video. In: International Conference on Computer Vision (ICCV) (2015)
Zhou, X., Zhu, M., Pavlakos, G., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Monocap: monocular human motion capture using a CNN coupled with a geometric prior. Trans. Pattern Anal. Mach. Intell. (TPAMI) (2018)
Zhu, S., Zhang, L., Smith, B.M.: Model evolution: an incremental approach to non-rigid structure from motion. In: Computer Vision and Pattern Recognition (CVPR), pp. 1165–1172 (2010)
Acknowledgement
Development of HDM-Net was supported by the project DYMANICS (01IW15003) of the German Federal Ministry of Education and Research (BMBF). The authors thank NVIDIA Corporation for the hardware donations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Golyanik, V., Shimada, S., Varanasi, K., Stricker, D. (2018). HDM-Net: Monocular Non-rigid 3D Reconstruction with Learned Deformation Model. In: Bourdot, P., Cobb, S., Interrante, V., kato, H., Stricker, D. (eds) Virtual Reality and Augmented Reality. EuroVR 2018. Lecture Notes in Computer Science(), vol 11162. Springer, Cham. https://doi.org/10.1007/978-3-030-01790-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-01790-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01789-7
Online ISBN: 978-3-030-01790-3
eBook Packages: Computer ScienceComputer Science (R0)