Multimedia Tools and Applications

, Volume 77, Issue 17, pp 22953–22963 | Cite as

Multi-scale CNNs for 3D model retrieval

  • Weizhi NieEmail author
  • Shu Xiang
  • Anan Liu


Recent advancements in low-cost 3D sensors and mobile devices of virtual 3D models have additionally facilitated the accessibility of 3D data. 3D model retrieval is becoming an indispensable function for modern search engines. An effective retrieval model is at the core of computer vision. With the continuous improvement of 3D data, there are large number of methods to solve this problem. Existing works proposed numerous works to deal with feature extraction and object matching. Most of them are unable to fully exploit the information of 3D representations. To address this problem, we propose a novel multi-layer deep network in this paper. First, multiple rendered images are extracted from a 3D object, and combined into one representative view, which is the actual input of the network. Then, the novel multi-layer network structure is trained and tested on these representative views, generating the feature leaning model, which owns the local and global information of a 3D object. Finally, simple Euclidean metric is used to compute the similarity between two different 3D models to complete the retrieval problem. Extensive experiments and corresponding experimental results have demonstrated the superiority of our approach.


3D model retrieval CNN multi-view Multi-scale 



The work is partially supported by the National Natural Science Foundation of China (No. 61502337).


  1. 1.
    Akgül CB, Sankur B, Yemez Y, Schmitt FJM (2009) 3D model retrieval using probability density-based shape descriptors. IEEE Trans Pattern Anal Mach Intell 31(6):1117–1133CrossRefzbMATHGoogle Scholar
  2. 2.
    Ansary TF, Daoudi M, Vandeborre J (2007) A Bayesian 3-d search engine using adaptive views clustering. IEEE Trans Multimedia 9(1):78–88CrossRefGoogle Scholar
  3. 3.
    Bu S, Liu Z, Han J, Wu J, Ji R (2014) Learning high-level feature by deep belief networks for 3-d model retrieval and recognition. IEEE Trans Multimedia 16(8):2154–2167CrossRefGoogle Scholar
  4. 4.
    Bustos B, Keim DA, Saupe D, Schreck T, Vranic DV (2005) Feature-based similarity search in 3d object databases. ACM Comput Surv 37(4):345–387CrossRefGoogle Scholar
  5. 5.
    Cao B, Kang Y, Lin S, Luo X, Xu S, Lv Z (2016) Style-sensitive 3d model retrieval through sketch-based queries. J Intell Fuzzy Syst 31(5):2637–2644CrossRefGoogle Scholar
  6. 6.
    Chen D, Tian X, Shen Y, Ouhyoung M (2003) On visual similarity based 3d model retrieval. Comput Graph Forum 22(3):223–232CrossRefGoogle Scholar
  7. 7.
    Deng J, Dong W, Socher R, Li L, Li K, Li F (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA, pp 248–255Google Scholar
  8. 8.
    Ding G, Zhou J, Guo Y, Lin Z, Zhao S, Han J (2017) Large-scale image retrieval with sparse embedded hashing. Neurocomputing 257:24–36CrossRefGoogle Scholar
  9. 9.
    Gao Y, Dai Q (2014) View-based 3d object retrieval: challenges and approaches. IEEE MultiMedia 21(3):52–57CrossRefGoogle Scholar
  10. 10.
    Gao Y, Tang J, Hong R, Yan S, Dai Q, Zhang N, Chua T (2012) Camera constraint-free view-based 3-d object retrieval. IEEE Trans Image Processing 21(4):2269–2281MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Gao Y, Wang M, Tao D, Ji R, Dai Q (2012) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Processing 21(9):4290–4303MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Gao Y, Zhen Y, Li H, Chua TS (2016) Filtering of brand-related microblogs using social-smooth multiview embedding. IEEE Trans Multimedia 18(10):2115–2126CrossRefGoogle Scholar
  13. 13.
    Gao Y, Zhang H, Zhao X, Yan S (2017) Event classification in microblogs via social tracking. ACM Trans Intell Syst Technol (TIST) 8(3):35Google Scholar
  14. 14.
    Hong R, Hu Z, Wang R, Wang M, Tao D (2016) Multi-view object retrieval via multi-scale topic models. IEEE Trans Image Processing 25(12):5814–5827MathSciNetCrossRefGoogle Scholar
  15. 15.
    Hu F, Xia G, Hu J, Zhang L (2015) Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens 7(11):14680–14707CrossRefGoogle Scholar
  16. 16.
    Irfanoglu MO, Gökberk B, Akarun L (2004) 3D shape-based face recognition using automatically registered facial surfaces. In: 17th international conference on pattern recognition, ICPR 2004, Cambridge, UK, August 23–26, 2004, pp 183–186Google Scholar
  17. 17.
    Kalogerakis E, Averkiou M, Maji S, Chaudhuri S (2016) 3D shape segmentation with projective convolutional networks. arXiv:1612.02808
  18. 18.
    Kazhdan MM, Funkhouser TA, Rusinkiewicz S (2003) Rotation invariant spherical harmonic representation of 3d shape descriptors. In: First eurographics symposium on geometry processing, Aachen, Germany, June 23–25, 2003, pp 156–164Google Scholar
  19. 19.
    LeCun Y, Haffner P, Bottou L, Bengio Y (1999) Object recognition with gradient-based learning. In: Shape, contour and grouping in computer vision, p 319Google Scholar
  20. 20.
    LeCun Y, Huang FJ, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: 2004 IEEE computer society conference on computer vision and pattern recognition (CVPR 2004), with CD-ROM, 27 June–2 July 2004, Washington, DC, USA, pp 97–104Google Scholar
  21. 21.
    Liu A, Wang Z, Nie W, Su Y (2015) Graph-based characteristic view set extraction and matching for 3d model retrieval. Inf Sci 320:429–442CrossRefGoogle Scholar
  22. 22.
    Liu A, Nie W, Gao Y, Su Y (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Processing 25(5):2103–2116MathSciNetCrossRefGoogle Scholar
  23. 23.
    Liu AA, Nie WZ, Gao Y et al (2017) View-based 3-d model retrieval: a benchmark. IEEE Transactions on Cybernetics PP(99):1–13CrossRefGoogle Scholar
  24. 24.
    Liu Q (2012) A survey of recent view-based 3d model retrieval methods. arXiv:1208.3670
  25. 25.
    Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International conference on intelligent robots and systems, IROS 2015, Hamburg, Germany, September 28–October 2, 2015, pp 922–928Google Scholar
  26. 26.
    Nie L, Wang M, Zha Z, Li G, Chua TS (2011) Multimedia answering: enriching text qa with media information. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. SIGIR ’11. ACM, pp 695–704Google Scholar
  27. 27.
    Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 30(2):13:1–13:23CrossRefGoogle Scholar
  28. 28.
    Saupe D, Vranic DV (2001) 3D model retrieval with spherical harmonics and moments. In: Pattern recognition, 23rd DAGM-symposium, Munich, Germany, September 12–14, 2001, proceedings, pp 392–397Google Scholar
  29. 29.
    Shi B, Bai S, Zhou Z, Bai X (2015) Deeppano: deep panoramic representation for 3-d shape recognition. IEEE Signal Process Lett 22(12):2339–2343CrossRefGoogle Scholar
  30. 30.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  31. 31.
    Su H, Maji S, Kalogerakis E, Learned-Miller EG (2015) Multi-view convolutional neural networks for 3d shape recognition. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pp 945–953Google Scholar
  32. 32.
    Tangelder JWH, Veltkamp RC (2003) Polyhedral model retrieval using weighted point sets. Int J Image Graphics 3(1):209CrossRefGoogle Scholar
  33. 33.
    Wang D, Wang B, Zhao S, Yao H, Liu H (2017) View-based 3d object retrieval with discriminative views. Neurocomputing 252:58–66CrossRefGoogle Scholar
  34. 34.
    Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D shapenets: a deep representation for volumetric shapes. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 1912–1920Google Scholar
  35. 35.
    Xie J, Dai G, Zhu F, Wong EK, Fang Y (2017) Deepshape: deep-learned shape descriptor for 3d shape retrieval. IEEE Trans Pattern Anal Mach Intell 39(7):1335–1345CrossRefGoogle Scholar
  36. 36.
    Xu X, Corrigan D, Dehghani A, Caulfield S, Moloney D (2016) 3D object recognition based on volumetric representation using convolutional neural networks. In: Articulated motion and deformable objects - 9th international conference, AMDO 2016, Palma de Mallorca, Spain, July 13–15, 2016, proceedings, pp 147–156Google Scholar
  37. 37.
    Yang S, Ramanan D (2015) Multi-scale recognition with dag-cnns. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pp 1215–1223Google Scholar
  38. 38.
    Zhao X, Si S, Dui H, Cai Z, Sun S (2013) Integrated importance measure for multi-state coherent systems of k level. J Syst Eng Electron 24(6):1029–1037CrossRefGoogle Scholar
  39. 39.
    Zhao X, Zhang H, Jiang Y et al (2013) An effective heuristic-based approach for partitioning. J Appl Math 2013(9):289–325Google Scholar
  40. 40.
    Zhao S, Chen L, Yao H, Zhang Y, Sun X (2015) Strategy for dynamic 3d depth data matching towards robust action retrieval. Neurocomputing 151:533–543CrossRefGoogle Scholar
  41. 41.
    Zhao S, Yao H, Zhang Y et al (2015) View-based 3d object retrieval via multi-modal graph learning. Signal Process 112(C):110–118CrossRefGoogle Scholar
  42. 42.
    Zhao X, Si S, Dui H, Cai Z, Wang J, Song X (2015) Compositional performance evaluation with importance measures. Communications in Statistics-Theory and Methods 44(24):5240–5253MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Zhao S, Yao H, Gao Y, Ji R, Ding G (2017) Continuous probability distribution prediction of image emotions via multitask shared sparse regression. IEEE Trans Multimedia 19(3):632–645CrossRefGoogle Scholar
  44. 44.
    Zhao X, Wang N, Zhang Y et al (2017) Beyond pairwise matching: Person reidentification via high-order relevance learning. IEEE Transactions on Neural Networks and Learning Systems PP(99):1–14Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Electrical and Information EngineeringTianjin UniversityTianjinChina

Personalised recommendations