Skip to main content

2D/3D Object Recognition and Categorization Approaches for Robotic Grasping

  • Chapter
  • First Online:
Advances in Soft Computing and Machine Learning in Image Processing

Part of the book series: Studies in Computational Intelligence ((SCI,volume 730))

  • 1867 Accesses

Abstract

Object categorization and manipulation are critical tasks for a robot to operate in the household environment. In this chapter, we propose new methods for visual recognition and categorization. We describe 2D object database and 3D point clouds with 2D/3D local descriptors which we quantify with the k-means clustering algorithm for obtaining the bag of words (BOW). Moreover, we develop a new global descriptor called VFH-Color that combines the original version of Viewpoint Feature Histogram (VFH) descriptor with the color quantization histogram, thus adding the appearance information that improves the recognition rate. The acquired 2D and 3D features are used for training Deep Belief Network (DBN) classifier. Results from our experiments for object recognition and categorization show an average of recognition rate between 91% and 99% which makes it very suitable for robot-assisted tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aldoma, A., Tombari, F., Rusu, R., Vincze, M.: OUR-CVFH–oriented, unique and repeatable clustered viewpoint feature histogram for object recognition and 6DOF pose estimation. Springer (2012)

    Google Scholar 

  2. Aldoma, A., Vincze, M., Blodow, N., Gossow, D., Gedikli, S., Rusu, R., Bradski, G.: Cad-model recognition and 6dof pose estimation using 3d cues. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops, pp. 585–592. IEEE (2011)

    Google Scholar 

  3. Alexandre, L.A.: 3d object recognition using convolutional neural networks with transfer learning between input channels. In: Intelligent Autonomous Systems 13, pp. 889–898. Springer (2016)

    Google Scholar 

  4. Antonelli, G., Fossen, T.I., Yoerger, D.R.: Underwater robotics. In: Springer Handbook of Robotics, pp. 987–1008. Springer (2008)

    Google Scholar 

  5. Avila, S., Thome, N., Cord, M., Valle, E., Araújo, A.D.A.: Bossa: Extended bow formalism for image classification. In: 2011 18th IEEE International Conference on Image Processing, pp. 2909–2912. IEEE (2011)

    Google Scholar 

  6. Bai, J., Nie, J.-Y., Paradis, F.: Using language models for text classification. In: Proceedings of the Asia Information Retrieval Symposium, Beijing, China (2004)

    Google Scholar 

  7. Basu, J.K., Bhattacharyya, D., Kim, T.-H.: Use of artificial neural network in pattern recognition. Int. J. Softw. Eng. Appl. 4, 2 (2010)

    Google Scholar 

  8. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)

    Article  Google Scholar 

  9. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Computer vision–ECCV 2006, pp. 404–417. Springer (2006)

    Google Scholar 

  10. Bengio, Y.: Learning deep architectures for ai. Foundations and trends®. Mach. Learn. 2(1), 1–127 (2009)

    Google Scholar 

  11. Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94(2), 115 (1987)

    Article  Google Scholar 

  12. Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 821–826. IEEE (2011)

    Google Scholar 

  13. Bolovinou, A., Pratikakis, I., Perantonis, S.: Bag of spatio-visual words for context inference in scene classification. Pattern Recogn. 46(3), 1039–1053 (2013)

    Article  Google Scholar 

  14. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, Prague, pp. 1–2 (2004)

    Google Scholar 

  15. Dunbabin, M., Corke, P., Vasilescu, I., Rus, D.: Data muling over underwater wireless sensor networks using an autonomous underwater vehicle. In: Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006, pp. 2091–2098. IEEE (2006)

    Google Scholar 

  16. Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust rgb-d object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687. IEEE (2015)

    Google Scholar 

  17. Fei, B., Ng, W.S., Chauhan, S., Kwoh, C.K.: The safety issues of medical robotics. Reliab. Eng. Syst. Safety 73(2), 183–192 (2001)

    Article  Google Scholar 

  18. Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings, vol. 2, IEEE, pp. II–264 (2003)

    Google Scholar 

  19. Filliat, D.: A visual bag of words method for interactive qualitative localization and mapping. In: 2007 IEEE International Conference on Robotics and Automation, pp. 3921–3926. IEEE (2007)

    Google Scholar 

  20. Forlizzi, J., DiSalvo, C.: Service robots in the domestic environment: a study of the roomba vacuum in the home. In: Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, pp. 258–265. ACM (2006)

    Google Scholar 

  21. Freund, E.: Fast nonlinear control with arbitrary pole-placement for industrial robots and manipulators. Int. J. Robot. Res. 1(1), 65–78 (1982)

    Article  Google Scholar 

  22. Geusebroek, J.-M., Burghouts, G.J., Smeulders, A.W.: The amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005)

    Article  Google Scholar 

  23. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  24. Hu, F., Xia, G.-S., Wang, Z., Huang, X., Zhang, L., Sun, H.: Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classification. IEEE J. Selected Topics Appl Earth Observ. Remote Sens. 8, 5 (2015)

    Google Scholar 

  25. Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., Darrell, T.: A category-level 3d object dataset: Putting the kinect to work. In: Consumer Depth Cameras for Computer Vision, pp. 141–165. Springer (2013)

    Google Scholar 

  26. Jaulin, L.: Robust set-membership state estimation; application to underwater robotics. Automatica 45(1), 202–206 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  27. Johnson, A., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 433–449 (1999)

    Article  Google Scholar 

  28. Khan, R., Barat, C., Muselet, D., Ducottet, C.: Spatial orientations of visual word pairs to improve bag-of-visual-words model. In: Proceedings of the British Machine Vision Conference, pp. 89–1. BMVA Press (2012)

    Google Scholar 

  29. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. IEEE (2011)

    Google Scholar 

  30. Larlus, D., Verbeek, J., Jurie, F.: Category level object segmentation by combining bag-of-words models with dirichlet processes and random fields. Int. J. Comput. Vis. 88(2), 238–253 (2010)

    Article  MathSciNet  Google Scholar 

  31. LeCun, Y., Huang, F.J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, vol. 2, pp. II–97. IEEE (2004)

    Google Scholar 

  32. Li, M., Ma, W.-Y., Li, Z., Wu, L.: Visual language modeling for image classification, Feb. 28 2012. US Patent 8,126,274

    Google Scholar 

  33. Li, T., Mei, T., Kweon, I.-S., Hua, X.-S.: Contextual bag-of-words for visual categorization. IEEE Trans. Circuits Syst. Video Technol. 21(4), 381–392 (2011)

    Article  Google Scholar 

  34. Lowe, D.G.: Object recognition from local scale-invariant features. In: The proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 1150–1157. IEEE (1999)

    Google Scholar 

  35. Madai-Tahy, L., Otte, S., Hanten, R., Zell, A.: Revisiting deep convolutional neural networks for rgb-d based object recognition. In: International Conference on Artificial Neural Networks, pp. 29–37. Springer (2016)

    Google Scholar 

  36. Madry, M., Ek, C.H., Detry, R., Hang, K., Kragic, D.: Improving generalization for 3d object categorization with global structure histograms. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1379–1386. IEEE (2012)

    Google Scholar 

  37. Mc Donald, K.R.: Discrete language models for video retrieval. Ph.D. thesis, Dublin City University (2005)

    Google Scholar 

  38. McCann, S., Lowe, D.G.: Local naive bayes nearest neighbor for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3650–3656. IEEE (2012)

    Google Scholar 

  39. Mian, A., Bennamoun, M., Owens, R.: On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes. Int. J. Comput. Vis. 89(2–3), 348–361 (2010)

    Article  Google Scholar 

  40. Nair, V., Hinton, G.E.: 3d object recognition with deep belief nets. In: Advances in Neural Information Processing Systems, pp. 1339–1347 (2009)

    Google Scholar 

  41. Ouadiay, F.Z., Zrira, N., Bouyakhf, E.H., Himmi, M.M.: 3d object categorization and recognition based on deep belief networks and point clouds. In: Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics, pp. 311–318 (2016)

    Google Scholar 

  42. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR 2007. IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8. IEEE (2007)

    Google Scholar 

  43. Potter, M.C.: Short-term conceptual memory for pictures. J. Exp. Psychol: Hum Learn. Mem. 2(5), 509 (1976)

    MathSciNet  Google Scholar 

  44. Rusu, R., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In: IEEE International Conference on Robotics and Automation, 2009. ICRA 2009, pp. 3212–3217. IEEE (2009)

    Google Scholar 

  45. Rusu, R., Blodow, N., Marton, Z., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2008, pp. 3384–3391 (2008)

    Google Scholar 

  46. Rusu, R., Bradski, G., Thibaux, R., Hsu, J.: Fast 3d recognition and pose using the viewpoint feature histogram. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2155–2162. IEEE (2010)

    Google Scholar 

  47. Rusu, R., Cousins, S.: 3D is here: point cloud library (PCL). In: IEEE International Conference on Robotics and Automation (ICRA) (Shanghai, China, May 9-13 2011)

    Google Scholar 

  48. Savarese, S., Fei-Fei, L.: 3d generic object categorization, localization and pose estimation. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8. IEEE (2007)

    Google Scholar 

  49. Schwarz, M., Schulz, H., Behnke, S.: Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1329–1335. IEEE (2015)

    Google Scholar 

  50. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th International Conference on Multimedia pp. 357–360. ACM (2007)

    Google Scholar 

  51. Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering object categories in image collections

    Google Scholar 

  52. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Ninth IEEE International Conference on Computer Vision, Proceedings, pp. 1470–1477. IEEE (2003)

    Google Scholar 

  53. Smolensky, P. Information processing in dynamical systems: Foundations of harmony theory

    Google Scholar 

  54. Socher, R., Huval, B., Bath, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3d object classification. In: Advances in Neural Information Processing Systems, pp. 665–673 (2012)

    Google Scholar 

  55. Tang, S., Wang, X., Lv, X., Han, T.X., Keller, J., He, Z., Skubic, M., Lao, S.: Histogram of oriented normal vectors for object recognition with a depth sensor. In: Asian Conference on Computer Vision, pp. 525–538. Springer (2012)

    Google Scholar 

  56. Toldo, R., Castellani, U., Fusiello, A.: A bag of words approach for 3d object categorization. In: Computer Vision/Computer Graphics CollaborationTechniques, pp. 116–127. Springer (2009)

    Google Scholar 

  57. Tombari, F., Salti, S., Stefano, D.L.: Unique signatures of histograms for local surface description. In: Computer Vision–ECCV 2010, pp. 356–369. Springer (2010)

    Google Scholar 

  58. Tombari, F., Salti, S., Stefano, L.: A combined texture-shape descriptor for enhanced 3d feature matching. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 809–812. IEEE (2011)

    Google Scholar 

  59. Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: Ninth IEEE International Conference on Computer Vision, 2003. Proceedings, pp. 273–280. IEEE (2003)

    Google Scholar 

  60. Vigo, D.A.R., Khan, F.S., Van de Weijer, J., Gevers, T.: The impact of color on bag-of-words based object recognition. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 1549–1553. IEEE (2010)

    Google Scholar 

  61. Visentin, G., Van Winnendael, M., Putz, P.: Advanced mechatronics in esa’s space robotics developments. In: 2001 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, 2001. Proceedings (2001), vol. 2, pp. 1261–1266. IEEE (2001)

    Google Scholar 

  62. Wohlkinger, W., Vincze, M.: Ensemble of shape functions for 3d object classification. In: 2011 IEEE International Conference on Robotics and Biomimetics (ROBIO) (2011), pp. 2987–2992. IEEE (2011)

    Google Scholar 

  63. Wu, L., Hoi, S.C., Yu, N.: Semantics-preserving bag-of-words models and applications. IEEE Trans. Image Process. 19(7), 1908–1920 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  64. Yoshida, K.: Achievements in space robotics. IEEE Robot. Automat. Mag. 16(4), 20–28 (2009)

    Article  Google Scholar 

  65. Zhang, H., Berg, A.C., Maire, M., Malik, J.: Svm-knn: discriminative nearest neighbor classification for visual category recognition. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 2126–2136. IEEE (2006)

    Google Scholar 

  66. Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: Coupled multi-index for accurate image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1939–1946 (2014)

    Google Scholar 

  67. Zhong, Y.: Intrinsic shape signatures: a shape descriptor for 3d object recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689–696. IEEE (2009)

    Google Scholar 

  68. Zhu, L., Rao, A.B., Zhang, A.: Theory of keyblock-based image retrieval. ACM Trans. Inf. Syst. (TOIS) 20(2), 224–257 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nabila Zrira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Zrira, N., Hannat, M., Bouyakhf, E.H., Ahmad Khan, H. (2018). 2D/3D Object Recognition and Categorization Approaches for Robotic Grasping. In: Hassanien, A., Oliva, D. (eds) Advances in Soft Computing and Machine Learning in Image Processing. Studies in Computational Intelligence, vol 730. Springer, Cham. https://doi.org/10.1007/978-3-319-63754-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63754-9_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63753-2

  • Online ISBN: 978-3-319-63754-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics