2D/3D Object Recognition and Categorization Approaches for Robotic Grasping

Zrira, Nabila; Hannat, Mohamed; Bouyakhf, El Houssine; Ahmad Khan, Haris

doi:10.1007/978-3-319-63754-9_26

Nabila Zrira⁴,
Mohamed Hannat⁴,
El Houssine Bouyakhf⁴ &
…
Haris Ahmad Khan⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 730))

1867 Accesses

Abstract

Object categorization and manipulation are critical tasks for a robot to operate in the household environment. In this chapter, we propose new methods for visual recognition and categorization. We describe 2D object database and 3D point clouds with 2D/3D local descriptors which we quantify with the k-means clustering algorithm for obtaining the bag of words (BOW). Moreover, we develop a new global descriptor called VFH-Color that combines the original version of Viewpoint Feature Histogram (VFH) descriptor with the color quantization histogram, thus adding the appearance information that improves the recognition rate. The acquired 2D and 3D features are used for training Deep Belief Network (DBN) classifier. Results from our experiments for object recognition and categorization show an average of recognition rate between 91% and 99% which makes it very suitable for robot-assisted tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aldoma, A., Tombari, F., Rusu, R., Vincze, M.: OUR-CVFH–oriented, unique and repeatable clustered viewpoint feature histogram for object recognition and 6DOF pose estimation. Springer (2012)
Google Scholar
Aldoma, A., Vincze, M., Blodow, N., Gossow, D., Gedikli, S., Rusu, R., Bradski, G.: Cad-model recognition and 6dof pose estimation using 3d cues. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops, pp. 585–592. IEEE (2011)
Google Scholar
Alexandre, L.A.: 3d object recognition using convolutional neural networks with transfer learning between input channels. In: Intelligent Autonomous Systems 13, pp. 889–898. Springer (2016)
Google Scholar
Antonelli, G., Fossen, T.I., Yoerger, D.R.: Underwater robotics. In: Springer Handbook of Robotics, pp. 987–1008. Springer (2008)
Google Scholar
Avila, S., Thome, N., Cord, M., Valle, E., Araújo, A.D.A.: Bossa: Extended bow formalism for image classification. In: 2011 18th IEEE International Conference on Image Processing, pp. 2909–2912. IEEE (2011)
Google Scholar
Bai, J., Nie, J.-Y., Paradis, F.: Using language models for text classification. In: Proceedings of the Asia Information Retrieval Symposium, Beijing, China (2004)
Google Scholar
Basu, J.K., Bhattacharyya, D., Kim, T.-H.: Use of artificial neural network in pattern recognition. Int. J. Softw. Eng. Appl. 4, 2 (2010)
Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Computer vision–ECCV 2006, pp. 404–417. Springer (2006)
Google Scholar
Bengio, Y.: Learning deep architectures for ai. Foundations and trends^®. Mach. Learn. 2(1), 1–127 (2009)
Google Scholar
Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94(2), 115 (1987)
Article Google Scholar
Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 821–826. IEEE (2011)
Google Scholar
Bolovinou, A., Pratikakis, I., Perantonis, S.: Bag of spatio-visual words for context inference in scene classification. Pattern Recogn. 46(3), 1039–1053 (2013)
Article Google Scholar
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, Prague, pp. 1–2 (2004)
Google Scholar
Dunbabin, M., Corke, P., Vasilescu, I., Rus, D.: Data muling over underwater wireless sensor networks using an autonomous underwater vehicle. In: Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006, pp. 2091–2098. IEEE (2006)
Google Scholar
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust rgb-d object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687. IEEE (2015)
Google Scholar
Fei, B., Ng, W.S., Chauhan, S., Kwoh, C.K.: The safety issues of medical robotics. Reliab. Eng. Syst. Safety 73(2), 183–192 (2001)
Article Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings, vol. 2, IEEE, pp. II–264 (2003)
Google Scholar
Filliat, D.: A visual bag of words method for interactive qualitative localization and mapping. In: 2007 IEEE International Conference on Robotics and Automation, pp. 3921–3926. IEEE (2007)
Google Scholar
Forlizzi, J., DiSalvo, C.: Service robots in the domestic environment: a study of the roomba vacuum in the home. In: Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, pp. 258–265. ACM (2006)
Google Scholar
Freund, E.: Fast nonlinear control with arbitrary pole-placement for industrial robots and manipulators. Int. J. Robot. Res. 1(1), 65–78 (1982)
Article Google Scholar
Geusebroek, J.-M., Burghouts, G.J., Smeulders, A.W.: The amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005)
Article Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Hu, F., Xia, G.-S., Wang, Z., Huang, X., Zhang, L., Sun, H.: Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classification. IEEE J. Selected Topics Appl Earth Observ. Remote Sens. 8, 5 (2015)
Google Scholar
Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., Darrell, T.: A category-level 3d object dataset: Putting the kinect to work. In: Consumer Depth Cameras for Computer Vision, pp. 141–165. Springer (2013)
Google Scholar
Jaulin, L.: Robust set-membership state estimation; application to underwater robotics. Automatica 45(1), 202–206 (2009)
Article MathSciNet MATH Google Scholar
Johnson, A., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 433–449 (1999)
Article Google Scholar
Khan, R., Barat, C., Muselet, D., Ducottet, C.: Spatial orientations of visual word pairs to improve bag-of-visual-words model. In: Proceedings of the British Machine Vision Conference, pp. 89–1. BMVA Press (2012)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. IEEE (2011)
Google Scholar
Larlus, D., Verbeek, J., Jurie, F.: Category level object segmentation by combining bag-of-words models with dirichlet processes and random fields. Int. J. Comput. Vis. 88(2), 238–253 (2010)
Article MathSciNet Google Scholar
LeCun, Y., Huang, F.J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, vol. 2, pp. II–97. IEEE (2004)
Google Scholar
Li, M., Ma, W.-Y., Li, Z., Wu, L.: Visual language modeling for image classification, Feb. 28 2012. US Patent 8,126,274
Google Scholar
Li, T., Mei, T., Kweon, I.-S., Hua, X.-S.: Contextual bag-of-words for visual categorization. IEEE Trans. Circuits Syst. Video Technol. 21(4), 381–392 (2011)
Article Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: The proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 1150–1157. IEEE (1999)
Google Scholar
Madai-Tahy, L., Otte, S., Hanten, R., Zell, A.: Revisiting deep convolutional neural networks for rgb-d based object recognition. In: International Conference on Artificial Neural Networks, pp. 29–37. Springer (2016)
Google Scholar
Madry, M., Ek, C.H., Detry, R., Hang, K., Kragic, D.: Improving generalization for 3d object categorization with global structure histograms. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1379–1386. IEEE (2012)
Google Scholar
Mc Donald, K.R.: Discrete language models for video retrieval. Ph.D. thesis, Dublin City University (2005)
Google Scholar
McCann, S., Lowe, D.G.: Local naive bayes nearest neighbor for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3650–3656. IEEE (2012)
Google Scholar
Mian, A., Bennamoun, M., Owens, R.: On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes. Int. J. Comput. Vis. 89(2–3), 348–361 (2010)
Article Google Scholar
Nair, V., Hinton, G.E.: 3d object recognition with deep belief nets. In: Advances in Neural Information Processing Systems, pp. 1339–1347 (2009)
Google Scholar
Ouadiay, F.Z., Zrira, N., Bouyakhf, E.H., Himmi, M.M.: 3d object categorization and recognition based on deep belief networks and point clouds. In: Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics, pp. 311–318 (2016)
Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR 2007. IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8. IEEE (2007)
Google Scholar
Potter, M.C.: Short-term conceptual memory for pictures. J. Exp. Psychol: Hum Learn. Mem. 2(5), 509 (1976)
MathSciNet Google Scholar
Rusu, R., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In: IEEE International Conference on Robotics and Automation, 2009. ICRA 2009, pp. 3212–3217. IEEE (2009)
Google Scholar
Rusu, R., Blodow, N., Marton, Z., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2008, pp. 3384–3391 (2008)
Google Scholar
Rusu, R., Bradski, G., Thibaux, R., Hsu, J.: Fast 3d recognition and pose using the viewpoint feature histogram. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2155–2162. IEEE (2010)
Google Scholar
Rusu, R., Cousins, S.: 3D is here: point cloud library (PCL). In: IEEE International Conference on Robotics and Automation (ICRA) (Shanghai, China, May 9-13 2011)
Google Scholar
Savarese, S., Fei-Fei, L.: 3d generic object categorization, localization and pose estimation. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8. IEEE (2007)
Google Scholar
Schwarz, M., Schulz, H., Behnke, S.: Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1329–1335. IEEE (2015)
Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th International Conference on Multimedia pp. 357–360. ACM (2007)
Google Scholar
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering object categories in image collections
Google Scholar
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Ninth IEEE International Conference on Computer Vision, Proceedings, pp. 1470–1477. IEEE (2003)
Google Scholar
Smolensky, P. Information processing in dynamical systems: Foundations of harmony theory
Google Scholar
Socher, R., Huval, B., Bath, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3d object classification. In: Advances in Neural Information Processing Systems, pp. 665–673 (2012)
Google Scholar
Tang, S., Wang, X., Lv, X., Han, T.X., Keller, J., He, Z., Skubic, M., Lao, S.: Histogram of oriented normal vectors for object recognition with a depth sensor. In: Asian Conference on Computer Vision, pp. 525–538. Springer (2012)
Google Scholar
Toldo, R., Castellani, U., Fusiello, A.: A bag of words approach for 3d object categorization. In: Computer Vision/Computer Graphics CollaborationTechniques, pp. 116–127. Springer (2009)
Google Scholar
Tombari, F., Salti, S., Stefano, D.L.: Unique signatures of histograms for local surface description. In: Computer Vision–ECCV 2010, pp. 356–369. Springer (2010)
Google Scholar
Tombari, F., Salti, S., Stefano, L.: A combined texture-shape descriptor for enhanced 3d feature matching. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 809–812. IEEE (2011)
Google Scholar
Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: Ninth IEEE International Conference on Computer Vision, 2003. Proceedings, pp. 273–280. IEEE (2003)
Google Scholar
Vigo, D.A.R., Khan, F.S., Van de Weijer, J., Gevers, T.: The impact of color on bag-of-words based object recognition. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 1549–1553. IEEE (2010)
Google Scholar
Visentin, G., Van Winnendael, M., Putz, P.: Advanced mechatronics in esa’s space robotics developments. In: 2001 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, 2001. Proceedings (2001), vol. 2, pp. 1261–1266. IEEE (2001)
Google Scholar
Wohlkinger, W., Vincze, M.: Ensemble of shape functions for 3d object classification. In: 2011 IEEE International Conference on Robotics and Biomimetics (ROBIO) (2011), pp. 2987–2992. IEEE (2011)
Google Scholar
Wu, L., Hoi, S.C., Yu, N.: Semantics-preserving bag-of-words models and applications. IEEE Trans. Image Process. 19(7), 1908–1920 (2010)
Article MathSciNet MATH Google Scholar
Yoshida, K.: Achievements in space robotics. IEEE Robot. Automat. Mag. 16(4), 20–28 (2009)
Article Google Scholar
Zhang, H., Berg, A.C., Maire, M., Malik, J.: Svm-knn: discriminative nearest neighbor classification for visual category recognition. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 2126–2136. IEEE (2006)
Google Scholar
Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: Coupled multi-index for accurate image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1939–1946 (2014)
Google Scholar
Zhong, Y.: Intrinsic shape signatures: a shape descriptor for 3d object recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689–696. IEEE (2009)
Google Scholar
Zhu, L., Rao, A.B., Zhang, A.: Theory of keyblock-based image retrieval. ACM Trans. Inf. Syst. (TOIS) 20(2), 224–257 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LIMIARF Laboratory, Faculty of Sciences Rabat, Mohammed V University Rabat, Rabat, Morocco
Nabila Zrira, Mohamed Hannat & El Houssine Bouyakhf
NTNU, Norwegian University of Science and Technology, Gjøvik, Norway
Haris Ahmad Khan

Authors

Nabila Zrira
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Hannat
View author publications
You can also search for this author in PubMed Google Scholar
El Houssine Bouyakhf
View author publications
You can also search for this author in PubMed Google Scholar
Haris Ahmad Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nabila Zrira .

Editor information

Editors and Affiliations

Faculty of Computers and Information, Information Technology Department, Cairo University, Giza, Egypt
Aboul Ella Hassanien
CUCEI, Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
Diego Alberto Oliva

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zrira, N., Hannat, M., Bouyakhf, E.H., Ahmad Khan, H. (2018). 2D/3D Object Recognition and Categorization Approaches for Robotic Grasping. In: Hassanien, A., Oliva, D. (eds) Advances in Soft Computing and Machine Learning in Image Processing. Studies in Computational Intelligence, vol 730. Springer, Cham. https://doi.org/10.1007/978-3-319-63754-9_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-63754-9_26
Published: 15 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63753-2
Online ISBN: 978-3-319-63754-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics