Skip to main content
Log in

3D convolutional neural network for object recognition: a review

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recognition of an object from an image or image sequences is an important task in computer vision. It is an important low-level image processing operation and plays a crucial role in many real-world applications. The challenges involved in object recognition are multi-model, multi-pose, complicated background, and depth variations. Recently developed methods have dealt with these challenges and have reported remarkable results for 3D objects. In this paper, a comprehensive overview of recent advances in 3D object recognition using Convolutional Neural Networks (CNN) has been presented. Along with the latest progress in 3D images, general overview of object recognition of 2D, 2.5D, and 3D images is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. https://www.gartner.com/smarterwithgartner/5-trends-emerge-in-gartner-hype-cycle-for-emerging-technologies-2018/, Released on 16 August, 2018

  2. scholar.google.co.in

  3. www.stereolabs.com/

  4. www.bestech.com.au/laser-triangulation-sensor/

  5. www.baslerweb.com/en/products/cameras/3d-cameras/

  6. www.samsung.com/uk/cameras/nx300-20-50mm-lens/EV-NX300ZBATGB/

  7. https://web.archive.org/web/20110311213211/

  8. www.asus.com/3D-Sensor/Xtion/

  9. web.archive.org/web/20131102094504/http://www.primesense.com/solutions/technology/

  10. host.robots.ox.ac.uk/pascal/VOC/voc2012/

  11. www.image-net.org/

  12. velodynelidar.com/hdl-64e.html

  13. yann.lecun.com/exdb/mnist/

  14. github.com

  15. openkinect.org/wiki/Main_Page

  16. structure.io/openni

  17. developer.microsoft.com/en-us/windows/kinect/tools

  18. pointclouds.org

  19. github.com/acfr/comma/wiki

  20. github.com/acfr/snark/wiki

  21. www.microsoft.com/en-us/research/project/kinectfusion-project-page/

  22. www.imperial.ac.uk/dyson-robotics-lab/downloads/elastic-fusion/

  23. www.blensor.org/

  24. developers.google.com/tango

  25. www.raspberrypi.org/

  26. www.microsoft.com/en-IE/hololens

  27. www.apple.com/lae/iphone-xs/face-id/

  28. developer.apple.com/arkit/

References

  1. Ackley DH, Hinton GE, Sejnowski TJ (1985) A learning algorithm for boltzmann machines. Cogn Sci 9(1):147–169

    Article  Google Scholar 

  2. Aggarwal JK, Xia L (2014) Human activity recognition from 3D data: a review. Pattern Recogn Lett 48:70–80

    Article  Google Scholar 

  3. Aldoma A, Marton ZC, Tombari F, Wohlkinger W, Potthast C, Zeisl B, Rusu R, Gedikli S, Vincze M (2012) Tutorial: point cloud library: three-dimensional object recognition and 6 DOF pose estimation. IEEE Robot Autom Mag 19(3):80–91

    Article  Google Scholar 

  4. Andreopoulos A, Tsotsos JK (2013) 50 Years of object recognition: directions forward. Comp Vision Image Underst 117(8):827–891

    Article  Google Scholar 

  5. Ankerst M, Kastenm”uller G, Kriegel HP, Seidl T (1999) 3D shape histograms for similarity search and classification in spatial databases. Proc 6th International Symposium on Spatial Databases 1651:207–226

    Google Scholar 

  6. Arman F, Aggarwal JK (1993) Model-based object recognition in dense-range images—a review. ACM Comput Surv 25(1):5–43

    Article  Google Scholar 

  7. Ayache S, Quénot G (2008) Video corpus annotation using active learning. In: European conference on information retrieval. Springer, pp 187–198

  8. Ba L, Caurana R (2013) Do deep nets really need to be deep?. arXiv:13126184 2014:1–6, arXiv:1312.6184v5

  9. Bai S, Bai X, Zhou Z, Zhang Z, Latecki LJ (2016) GIFT: a real-time and scalable 3D shape search engine. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 5023–5032. arXiv:http://arXiv.org/abs/1604.01879

  10. Baltrušaitis T, Robinson P, Morency LP (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–10

  11. Bautista CM, Dy CA, Mañalac MI, Orbe RA, Cordel M (2016) Convolutional neural network for vehicle detection in low resolution traffic videos. In: 2016 IEEE region 10 symposium (TENSYMP). IEEE, pp 277–281

  12. Ben-Shabat Y, Lindenbaum M, Fischer A (2017) 3d point cloud classification and segmentation using 3d modified fisher vector representation for convolutional neural networks. arXiv:171108241

  13. Besl PJ, Jain RC (1985) Three-dimensional object recognition. ACM Comput Surv 17(1):75–145

    Article  Google Scholar 

  14. Bespalov D, Regli WC, Shokoufandeh A (2003) Reeb graph based shape retrieval for CAD. In: Volume 1: 23rd computers and information in engineering conference, parts a and b, ASME, vol 2003, pp 229–238

  15. Bian X, Lim SN, Zhou N (2016) Multiscale fully convolutional network with application to industrial inspection. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–8

  16. Bo L, Ren X, Fox D (2013) Unsupervised feature learning for rgb-d based object recognition. In: Experimental robotics. Springer, pp 387–402

  17. Boureau YL, Ponce J, LeCun Y (2010) A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 111–118

  18. Brady J, Nandhakumar N, Aggarwal J (1988) Recent progress in the recognition of objects from range data. In: [1988 Proceedings] 9th international conference on pattern recognition, IEEE Comput. Soc. Press, pp 85–92. https://doi.org/10.1109/ICPR.1988.28178. http://ieeexplore.ieee.org/document/28178/

  19. Bronstein AM, Bronstein MM, Ovsjanikov M (2010) 3D features, surface descriptors, and object descriptors. 3D Imaging, Analysis, and Applications, pp 1–27

  20. Bucak SS, Jin R, Jain AK (2014) Multiple kernel learning for visual object recognition: a review. IEEE Trans Pattern Anal Mach Intell 36(7):1354–1369

    Article  Google Scholar 

  21. Byeon YH, Kwak KC (2014) Facial expression recognition using 3D convolutional neural network. Int J Adv Comput Sci Appl 5(12):107–112. https://doi.org/10.14569/IJACSA.2014.051215

    Article  Google Scholar 

  22. Campbell RJ, Flynn PJ (2001) A survey of Free-Form object representation and recognition techniques. Comput Vis Image Underst 81(2):166–210

    Article  MATH  Google Scholar 

  23. Chellapilla K, Puri S, Simard P (2006) High performance convolutional neural networks for document processing. In: 10th international workshop on frontiers in handwriting recognition. Suvisoft

  24. Chen L, Wei H, Ferryman J (2013) A survey of human motion analysis using depth imagery. Pattern Recogn Lett 34(15):1995–2006

    Article  Google Scholar 

  25. Choi C, Schwarting W, Delpreto J, Rus D (2018) Learning object grasping for soft robot hands. IEEE Robotics and Automation Letters

  26. Cicirello V, Regli WC (2001) Machining feature-based comparisons of mechanical parts. Proceedings - International Conference on Shape Modeling and Applications, SMI 2001, pp 176–184

  27. Ciresan D, Meier U, Masci J, Schmidhuber J (2011) A committee of neural networks for traffic sign classification. In: The 2011 international joint conference on neural networks. IEEE, vol 1, pp 1918–1921

  28. Cordts M, Omran M, Ramos S, Scharwächter T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2015) The cityscapes dataset. In: CVPR workshop on the future of datasets in vision

  29. Deng J, Ding N, Jia Y, Frome A, Murphy K, Bengio S, Li Y, Neven H, Adam H (2014) Large-scale object classification using label relation graphs. In: European conference on computer vision. Springer, pp 48–64

  30. Fathollahi M, Kasturi R (2016) Autonomous driving challenge: to infer the property of a dynamic object based on its motion pattern. In: Computer vision–ECCV 2016 workshops. Springer, 40–46

  31. Fukushima K (1988) Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural networks 1(2):119–130

    Article  Google Scholar 

  32. Fukushima K, Miyake S (1982) Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition. Springer, Berlin, pp 267–285. https://doi.org/10.1007/978-3-642-46466-9_18

    Google Scholar 

  33. Furukawa Y, Hernández C et al (2015) Multi-view stereo: a tutorial. Foundations and Trends®;, in Computer Graphics and Vision 9(1-2):1–148

    Article  Google Scholar 

  34. Gao S, Tsang IWH, Chia LT, Zhao P (2010) Local features are not lonely–laplacian sparse coding for image classification. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3555–3561

  35. Glocker B, Izadi S, Shotton J, Criminisi A (2013) Real-time rgb-d camera relocalization. In: 2013 IEEE international symposium on mixed and augmented reality (ISMAR). IEEE, pp 173–179

  36. Gómez M J, García F, Martín D, de la Escalera A, Armingol JM (2015) Intelligent surveillance of indoor environments based on computer vision and 3d point cloud fusion. Expert Syst Appl 42(21):8156– 8171

    Article  Google Scholar 

  37. Gomez-Donoso F, Garcia-Garcia A, Garcia-Rodriguez J, Orts-Escolano S, Cazorla M (2017) Lonchanet: a sliced-based cnn architecture for real-time 3d object recognition. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 412–418

  38. Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. arXiv:13024389

  39. Guo Y, Bennamoun M, Sohel F, Lu M, Wan J (2014) 3D object recognition in cluttered scenes with local surface features: a survey. IEEE Trans Pattern Anal Mach Intell 36(11):2270–2287

    Article  Google Scholar 

  40. Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep learning for visual understanding: a review. Neurocomputing 187:27–48. https://doi.org/10.1016/j.neucom.2015.09.116

    Article  Google Scholar 

  41. Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. arXiv:14075736 pp 1–16. arXiv:1407.5736v1

  42. Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3d skeletal data: a review. Comput Vis Image Underst 158:85–105

    Article  Google Scholar 

  43. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision. Springer, pp 346–361

  44. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  45. Hecht-Nielsen R (1989) Theory of the backpropagation neural network. Proceedings Of The International Joint Conference On Neural Networks 1:593–605

    Article  Google Scholar 

  46. Hinton GE, Sejnowski TJ (1986) Learning and releaming in boltzmann machines. Parallel Distrilmted Processing 1

  47. Hinton GE, Salakhutdinov RR (2006a) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

  48. Hinton GE, Osindero S, Teh YW (2006b) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

  49. Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160(1):106–154

    Article  Google Scholar 

  50. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:150203167

  51. Ip CY, Lapadat D, Sieger L, Regli WC (2002) Using shape distributions to compare solid models. Proceedings of the 7th ACM symposium on Solid modeling and applications SMA 02, pp 273–280

  52. Iyer N, Jayanti S, Lou K, Kalyanaraman Y, Ramani K (2004) A multi-scale hierarchical 3D shape reprsentation for similar shape retrieval. Tmce, pp 1–10

  53. Ji S, Yang M, Yu K, Xu W (2013) 3D convacolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–31

    Article  Google Scholar 

  54. Johns E, Leutenegger S, Davison AJ (2016) Pairwise decomposition of image sequences for active multi-view recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3813–3822. arXiv:1605.08359

  55. Jourabloo A, Liu X (2016) Large-pose face alignment via cnn-based dense 3d model fitting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4188–4196

  56. Kanezaki A, Matsushita Y, Nishida Y (2016) Rotationnet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints. arXiv:160306208

  57. Kendall A, Cipolla R (2016) Modelling uncertainty in deep learning for camera relocalization. Proceedings of the international conference on robotics and automation (ICRA)

  58. Kendall A, Grimes M, Cipolla R (2015) Posenet: A convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp 2938– 2946

  59. Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480

    Article  Google Scholar 

  60. Kȯrtgen M, Park GJ, Novotni M, Klein R (2003) 3D shape matching with 3D shape contexts. In: Proceedings of The 7th central European seminar on computer graphics 2003, pp 5–17

  61. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst, pp 1097–1105. arXiv:1102.0183

  62. Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE international conference on robotics and automation (ICRA). IEEE, pp 1817–1824

  63. Lazaros N, Sirakoulis GC, Gasteratos A (2008) Review of stereo vision algorithms: from software to hardware. Int J Optoelectron 2(4):435–462

    Google Scholar 

  64. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition

  65. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521 (7553):436–444. arXiv:1312.6184v5

    Article  Google Scholar 

  66. Lee CY, Gallagher PW, Tu Z (2016) Generalizing pooling functions in convolutional neural networks: mixed, gated, and tree. In: Artificial intelligence and statistics, pp 464–472

  67. Li B, Lu Y, Li C, Godil A, Schreck T, Aono M, Burtscher M, Fu H, Furuya T, Johan H et al (2014) Shrec’14 track: extended large scale sketch-based 3d shape retrieval. In: Eurographics workshop on 3d object retrieval, vol 2014

  68. Li B, Zhang T, Xia T (2016) Vehicle detection from 3d lidar using fully convolutional network. arXiv:160807916

  69. Li J, Chen BM, Lee GH (2018) So-net: Self-organizing network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9397–9406

  70. Li L (2014) Time-of-flight camera–an introduction. Technical white paper (SLOA190B)

  71. Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: 2010 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 9–14

  72. Lin M, Chen Q, Yan S (2013) Network in network. arXiv:13124400

  73. Liu S, Giles CL, Ororbia I, Alexander G (2017) Learning a hierarchical latent-variable model of 3d shapes. arXiv:170505994

  74. Liu Z, Zhang C, Tian Y (2016) 3D-based deep convolutional neural network for action recognition with depth sequences. Image Vis Comput 55:93–100

    Article  Google Scholar 

  75. Loncomilla P, Ruiz-del Solar J (2016) Object recognition using local invariant features for robotic applications: a survey. Pattern Recogn 60:499–514

    Article  Google Scholar 

  76. Ma C, Guo Y, Lei Y, An W (2018) Binary volumetric convolutional neural networks for 3-d object recognition. IEEE Trans Instrum Meas (99). Available online

  77. Makantasis K, Karantzalos K, Doulamis A, Loupos K (2015) Deep learning-based man-made object detection from hyperspectral data. In: International symposium on visual computing. Springer, pp 717–727

  78. Mamic G, Bennamoun M (2002) Representation and recognition of 3D free-form objects. Digital Signal Process 12(1):47–76

    Article  MATH  Google Scholar 

  79. Mantas J (1986) An overview of character recognition methodologies. Pattern Recogn 19(6):425–430. https://doi.org/10.1016/0031-3203(86)90040-3

    Article  Google Scholar 

  80. Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. Iros, pp 922–928

  81. McCormac J, Handa A, Davison A, Leutenegger S (2016) SemanticFusion: dense 3D semantic mapping with convolutional neural networks. arXiv:1609.05130

  82. Meagher D (1982) Geometric modeling using octree encoding. Comput Graphics Image Process 19(2):129–147

    Article  Google Scholar 

  83. Mehta D, Sridhar S, Sotnychenko O, Rhodin H, Shafiei M, Seidel HP, Xu W, Casas D, Theobalt C (2017) Vnect: real-time 3d human pose estimation with a single rgb camera. ACM Trans Graphics (TOG) 36(4):44

    Article  Google Scholar 

  84. Mhaskar H, Liao Q, Poggio T (2016) Learning functions: when is deep better than shallow. arXiv (45):1–12. arXiv:1603.00988

  85. Mian AS, Bennamoun M, Owens R (2004) Automated 3D model-based free-form object recognition. Sens Rev 24(2):206–215

    Article  Google Scholar 

  86. Mian AS, Bennamoun M, Owens RA (2005) Automatic correspondence for 3d modeling: an extensive review. Int J Shape Model 11(02):253–291

    Article  MATH  Google Scholar 

  87. Miller A, Jain V, Mundy JL (2011) Real-time rendering and dynamic updating of 3-d volumetric data. In: Proceedings of the fourth workshop on general purpose processing on graphics processing units. ACM, p 8

  88. Muller U, Ben J, Cosatto E, Flepp B, Cun YL (2006) Off-road obstacle avoidance through end-to-end learning. In: Advances in neural information processing systems, pp 739–746

  89. Naguri CR, Bunescu RC (2017) Recognition of dynamic hand gestures from 3d motion data using lstm and cnn architectures. In: 2017 16th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 1130–1133

  90. Ngiam J, Chen Z, Koh PW, Ng AY (2011) Learning deep energy models. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 1105–1112

  91. Nie D, Cao X, Gao Y, Wang L, Shen D (2016) Estimating ct image from mri data using 3d fully convolutional networks. In: Deep learning and data labeling for medical applications. Springer, pp 170–178

  92. Nie W, Cao Q, Liu A, Su Y (2017) Convolutional deep learning for 3d object retrieval. Multimedia Systems 23(3):325–332

    Article  Google Scholar 

  93. Novotni M, Klein R (2001) A geometric approach to 3d object comparison. In: 2001 International conference on shape modeling and applications, SMI. IEEE, pp 167–175

  94. Ouyang W, Luo P, Zeng X, Qiu S, Tian Y, Li H, Yang S, Wang Z, Xiong Y, Qian C et al (2014) Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. arXiv:14093505

  95. Papon J, Schoeler M (2015) Semantic pose using deep networks trained on synthetic RGB-D. arXiv:http://arXiv.org/abs/1508.00835

  96. Passalis N, Tefas A (2017) Bag-of-features pooling for deep convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision (to appear)

  97. Poggio T, Mhaskar H, Rosasco L, Miranda B, Liao Q (2016) Why and when can deep – but not shallow – networks avoid the curse of dimensionality: a review. CBMM Memo (58). arXiv:1611.00740

  98. Qi CR, Su H, NieBner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view CNNs for object classification on 3D data. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 5648–5656. arXiv:1604.03265

  99. Qi CR, Su H, Mo K, Guibas LJ (2017a) Pointnet: deep learning on point sets for 3d classification and segmentation. Proc Computer Vision And Pattern Recognition (CVPR) IEEE 1(2):4

  100. Qi CR, Yi L, Su H, Guibas LJ (2017b) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems, pp 5099–5108

  101. Quadros A, Underwood JP, Douillard B (2012) An occlusion-aware feature for range images. In: 2012 IEEE international conference on robotics and automation (ICRA). IEEE, pp 4428–4435

  102. Ranzato M, Poultney C, Chopra S, LeCun Y (2006) Efficient learning of sparse representations with an energy-based model. In: Proceedings of the 19th international conference on neural information processing systems. MIT Press, pp 1137–1144

  103. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542

  104. Riegler G, Ulusoys AO, Geiger A (2016) Octnet: learning deep 3d representations at high resolutions. arXiv:161105009

  105. Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th international conference on international conference on machine learning. Omnipress, pp 833–840

  106. Roth HR, Lu L, Seff A, Cherry KM, Hoffman J, Wang S, Liu J, Turkbey E, Summers RM (2014) A new 2.5D representation for lymph node detection using random sets of deep convolutional neural network observations. Medical image computing and computer-assisted intervention – MICCAI 2014: 17th international conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part I i:520–527, arXiv:1406.2639

  107. Salakhutdinov R, Hinton G (2009) Deep boltzmann machines. In: Artificial intelligence and statistics, pp 448–455

  108. Salvi J, Matabosch C, Fofi D, Forest J (2007) A review of recent range image registration methods with accuracy evaluation. Image Vis Comput 25 (5):578–596

    Article  Google Scholar 

  109. Scherer D, Müller A, Behnke S (2010) Evaluation of pooling operations in convolutional architectures for object recognition. Artificial Neural Networks–ICANN 2010:92–101

    Google Scholar 

  110. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. arXiv:1404.7828

    Article  Google Scholar 

  111. Schȯlkopf B, Burges CJC, Smola AJ (eds) (1999) Advances in kernel methods: support vector learning. MIT Press, Cambridge

  112. Schwarz M, Schulz H, Behnke S (2015) RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. IEEE international conference on robotics and automation (ICRA’15) (May): 1329–1335

  113. Sedaghat N, Zolfaghari M, Brox T (2016) Orientation-boosted Voxel nets for 3D object recognition. arXiv:160403351 [csCV] pp 1–22

  114. Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019

  115. Shen Y, Feng C, Yang Y, Tian D (2018) Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 4

  116. Shilane P, Min P, Kazhdan M, Funkhouser T (2004) The princeton shape benchmark. In: Shape modeling applications, 2004. Proceedings, IEEE, pp 167–178

  117. Siddiqi K, Zhang J, Macrini D, Shokoufandeh A, Bouix S, Dickinson S (2008) Retrieving articulated 3-d models using medial surfaces. Mach Vis Appl 19 (4):261–275

    Article  MATH  Google Scholar 

  118. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. Comput Vis–ECCV 2012: 746–760

  119. Simard PY, Steinkraus D, Platt JC, et al. (2003) Best practices for convolutional neural networks applied to visual document analysis. ICDAR 3:958–962

    Google Scholar 

  120. Simonovsky M, Komodakis N (2017) Dynamic edgeconditioned filters in convolutional neural networks on graphs. In: Proc. CVPR

  121. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556

  122. Singh S (2013) Optical character recognition techniques: a survey. Journal of Emerging Trends in Computing and Information Sciences 4(6):545–550

    Google Scholar 

  123. Socher R, Huval B (2012) Convolutional-recursive deep learning for 3D object classification. Advances in Neural ..., (i): 1–9

  124. Sochor J, Juránek R, Herout A (2017) Traffic surveillance camera calibration by 3d model bounding box alignment for accurate vehicle speed measurement. Comput Vis Image Underst 161:87–98

    Article  Google Scholar 

  125. Soltanpour S, Boufama B, Wu QJ (2017) A survey of local feature methods for 3d face recognition. Pattern Recogn 72:391–406

    Article  Google Scholar 

  126. Song S, Xiao J (2015) Deep sliding shapes for Amodal 3D object detection in RGB-D Images. arXiv preprint 8694:1–8, arXiv:1511.02300

  127. Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2015) Striving for simplicity: the all convolutional net. Iclr pp 1–14, arXiv:1412.6806

  128. Su H, Maji S, Kalogerakis E, Learned-miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. Ieee Iccv pp 945–953, arXiv:1505.00880

  129. Su H, Qi CR, Li Y, Guibas LJ (2016) Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: Proceedings of the IEEE international conference on computer vision. IEEE, vol 11-18-Dece, pp 2686–2694

  130. Suetens P (2017) Fundamentals of medical imaging. Cambridge University Press, Cambridge

    Book  Google Scholar 

  131. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  132. Tangelder JWH, Veltkamp RC (2008) A survey of content based 3D shape retrieval methods. Multimedia Tools and Applications 39(3):441–471

    Article  Google Scholar 

  133. Tombari F, Salti S, Di Stefano L (2013) Performance evaluation of 3D keypoint detectors. Int J Comput Vis 102(1-3):198–220

    Article  Google Scholar 

  134. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  135. Vidya R, Nasira G, Priyankka RJ (2014) Sparse coding: a deep learning using unlabeled data for high-level representation. In: 2014 World Congress on Computing and communication technologies (WCCCT). IEEE, pp 124–127

  136. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103

  137. Wan L, Zeiler M, Zhang S, Cun YL, Fergus R (2013) Regularization of neural networks using dropconnect. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 1058–1066

  138. Wang PS, Liu Y, Guo YX, Sun CY, Tong X (2017) O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM Trans Graph (TOG) 36 (4):72

    Google Scholar 

  139. Wu H, Gu X (2015) Towards dropout training for convolutional neural networks. Neural Netw 71:1–10

    Article  Google Scholar 

  140. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920

  141. Wu Z, Huang Y, Wang L, Wang X, Tan T (2017) A comprehensive study on cross-view gait based human identification with deep cnns. IEEE Trans Pattern Anal Mach Intell 39(2):209–226

    Article  Google Scholar 

  142. Xia L, Chen C, Aggarwal J (2012) View invariant human action recognition using histograms of 3d joints. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 20–27

  143. Xiang Y, Mottaghi R, Savarese S (2014) Beyond pascal: a benchmark for 3d object detection in the wild. In: 2014 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 75–82

  144. Xie Z, Xu K, Shan W, Liu L, Xiong Y, Huang H (2015) Projective feature learning for 3D shapes with multi-view depth images. Comput Graphics Forum 34(7):1–11

    Article  Google Scholar 

  145. Yang J, Yu K, Gong Y, Huang T (2009a) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE conference on vision, computer and pattern recognition, 2009. CVPR 2009. IEEE, pp 794–1801

  146. Yang M, Ji S, Xu W (2009b) Wang j, Detecting human actions in surveillance videos. Proceedings of the TrecVID Video Evaluation Workshop, Lv F

  147. Yuen K, Martin S, Trivedi MM (2016) Looking at faces in a vehicle: a deep cnn based approach and evaluation. In: 2016 IEEE 19th international conference on intelligent transportation systems (ITSC). IEEE, pp 649–654

  148. Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. arXiv:13013557

  149. Zeiler MD, Krishnan D, Taylor GW, Fergus R (2010) Deconvolutional networks. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 2528–2535. arXiv:1302.1700

  150. Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE Multimedia 19 (2):4–10

    Article  Google Scholar 

  151. Zhao R, Ali H, van der Smagt P (2017) Two-stream rnn/cnn for action recognition in 3d videos. arXiv:170309783

  152. Zhi S, Liu Y, Li X, Guo Y (2017) Toward real-time 3d object recognition: a lightweight volumetric cnn framework using multitask learning. Comput & Graphics

  153. Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:160606160

  154. Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European conference on computer vision. Springer, pp 141–154

  155. Zia MZ, Stark M, Schiele B, Schindler K (2013) Detailed 3d representations for object recognition and modeling. IEEE Trans Pattern Anal Mach Intell 35(11):2608–2623

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahul Dev Singh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, R.D., Mittal, A. & Bhatia, R.K. 3D convolutional neural network for object recognition: a review. Multimed Tools Appl 78, 15951–15995 (2019). https://doi.org/10.1007/s11042-018-6912-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6912-6

Keywords

Navigation