Robotic Understanding of Object Semantics by Referringto a Dictionary


Scene understanding is a fundamental challenge for intelligent robots, especially for social robots, which are expected to have a human-like perception, comprehension, and knowledge. This paper proposes an approach to enable robots not only to detect objects in a scene but also to understand and reason the working environments. The proposed method contains three parts, which are object detection, object semantic comprehension, and feedback on robotic comprehension. Semantic comprehension is based on dictionary definitions of objects. The category, function, property, and composition of the detected objects are analyzed. These four elements are used to assist the robot in comprehending the target object in details. The experiment part of this paper discusses the applicability of the proposed method on robots.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.

    Li H, Cabibihan J-J, Tan YK (2011) Towards an effective design of social robots. Int J Soc Robot 3(4):333–335

    Article  Google Scholar 

  2. 2.

    Yan H, Ang MH, Poo AN (2014) A survey on perception methods for human–robot interaction in social robots. Int J Soc Robot 6(1):85–119

    Article  Google Scholar 

  3. 3.

    Rosman B, Ramamoorthy S (2011) Learning spatial relationships between objects. Int J Robot Res 30(11):1328–1342

    Article  Google Scholar 

  4. 4.

    Fong T, Nourbakhsh I, Dautenhahn K (2003) A survey of socially interactive robots. Robot Auton Syst 42(3–4):143–166

    Article  Google Scholar 

  5. 5.

    Bartneck C, Forlizzi J (2004) A design-centred framework for social human–robot interaction. In: RO-MAN 2004. 13th IEEE international workshop on robot and human interactive communication (IEEE Catalog No. 04TH8759). IEEE, 2004, pp 591–594

  6. 6.

    Breazeal CL (2002) Designing sociable robots. MIT Press, Cambridge

    Google Scholar 

  7. 7.

    Ersen M, Oztop E, Sariel S (2017) Cognition-enabled robot manipulation in human environments: requirements, recent work, and open problems. IEEE Robot Autom Mag 24(3):108–122

    Article  Google Scholar 

  8. 8.

    Camarasa GA, Siebert JP (2009) A hierarchy of visual behaviours in an active binocular robot head

  9. 9.

    Aragon-Camarasa G, Fattah H, Siebert JP (2010) Towards a unified visual framework in a binocular active robot vision system. Robot Auton Syst 58(3):276–286

    Article  Google Scholar 

  10. 10.

    Papageorgiou CP, Oren M, Poggio T (1998) A general framework for object detection. In: Sixth international conference on computer vision (IEEE Cat. No. 98CH36271). IEEE, 1998, pp 555–562

  11. 11.

    Fulkerson B, Vedaldi A, Soatto S (2009) Class segmentation and object localization with superpixel neighborhoods. In: 2009 IEEE 12th international conference on computer vision. IEEE, 2009, pp 670–677

  12. 12.

    Gevers T, Smeulders AW (1999) Color-based object recognition. Pattern Recogn 32(3):453–464

    Article  Google Scholar 

  13. 13.

    Bai X, Yang X, Latecki LJ (2008) Detection and recognition of contour parts based on shape similarity. Pattern Recognit 41(7):2189–2199

    Article  Google Scholar 

  14. 14.

    Zhou B, Zhao H, Puig X, Xiao T, Fidler S, Barriuso A, Torralba A (2019) Semantic understanding of scenes through the ade20k dataset. Int J Comput Vis 127(3):302–321

    Article  Google Scholar 

  15. 15.

    Tenorth M, Kunze L, Jain D, Beetz M (2010) Knowrob-map-knowledge-linked semantic object maps. In: 2010 10th IEEE-RAS international conference on humanoid robots (humanoids). IEEE, 2010, pp 430–435

  16. 16.

    Pangercic D, Tenorth M, Jain D, Beetz M (2010) Combining perception and knowledge processing for everyday manipulation-k-copman

  17. 17.

    Beetz M, Bálint-Benczédi F, Blodow N, Nyga D, Wiedemeyer T, Márton Z-C (2015) Robosherlock: unstructured information processing for robot perception. In: 2015 IEEE international conference on robotics and automation (ICRA). IEEE, 2015, pp 1549–1556

  18. 18.

    Pangercic D, Tenorth M, Jain D, Beetz M (2010) Combining perception and knowledge processing for everyday manipulation. In: 2010 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2010, pp 1065–1071

  19. 19.

    Anderson JE (1995) Constraint-directed improvisation for everyday activities

  20. 20.

    Thrun S (2002) Probabilistic robotics. Commun ACM 45(3):52–57

    Article  Google Scholar 

  21. 21.

    Kunze L, Tenorth M, Beetz M (2010) Putting people’s common sense into knowledge bases of household robots. In: Annual conference on artificial intelligence. Springer 2010, pp 151–159

  22. 22.

    Ai-Chang M, Bresina J, Charest L, Chase A, Hsu J-J, Jonsson A, Kanefsky B, Morris P, Rajan K, Yglesias J et al (2004) Mapgen: mixed-initiative planning and scheduling for the mars exploration rover mission. IEEE Intell Syst 19(1):8–12

    Article  Google Scholar 

  23. 23.

    Cristianini N, Shawe-Taylor J et al (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    Google Scholar 

  24. 24.

    Schalkoff RJ (1997) Artificial neural networks, vol 1. McGraw-Hill, New York

    Google Scholar 

  25. 25.

    Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73

    MathSciNet  Article  Google Scholar 

  26. 26.

    Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems 2015, pp 91–99

  27. 27.

    Tenorth M, Nyga D, Beetz M (2010) Understanding and executing instructions for everyday manipulation tasks from the world wide web. In: 2010 IEEE international conference on robotics and automation. IEEE, 2010, pp 1486–1491

  28. 28.

    Matuszek C, Fox D, Koscher K (2010) Following directions using statistical machine translation. In: 2010 5th ACM/IEEE international conference on human–robot interaction (HRI). IEEE, 2010, pp 251–258

  29. 29.

    Tellex S, Kollar T, Dickerson S, Walter MR, Banerjee AG, Teller S, Roy N (2011) Understanding natural language commands for robotic navigation and mobile manipulation. In: Twenty-fifth AAAI conference on artificial intelligence

  30. 30.

    Dzifcak J, Scheutz M, Baral C, Schermerhorn P (2009) What to do and how to do it: translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In: 2009 IEEE International Conference on Robotics and Automation. IEEE, 2009, pp 4163–4168

  31. 31.

    Goodrich MA, Schultz AC et al (2008) Human-robot interaction: a survey. Found Trends Hum Comput Interaction 1(3):203–275

    Article  Google Scholar 

  32. 32.

    Mataric MJ (1990) A distributed model for mobile robot environment-learning and navigation. Massachusetts Inst of Tech Cambridge Artificial Intelligence Lab, Technical Report

  33. 33.

    Valada A, Oliveira GL, Brox T, Burgard W (2016) Deep multispectral semantic scene understanding of forested environments using multimodal fusion. In: International symposium on experimental robotics. Springer 2016, pp 465–477

  34. 34.

    Whelan T, Leutenegger S, Salas-Moreno R, Glocker B, Davison A (2015) Elasticfusion: dense slam without a pose graph. Robotics: Science and Systems

  35. 35.

    Popović M, Kootstra G, Jørgensen JA, Kragic D, Krüger N (2011) Grasping unknown objects using an early cognitive vision system for general scene understanding. In: 2011 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2011, pp 987–994

  36. 36.

    Jodoin P-M, Benezeth Y, Wang Y (2013) Meta-tracking for video scene understanding. In: 2013 10th IEEE international conference on advanced video and signal based surveillance. IEEE, 2013, pp 1–6

  37. 37.

    Emami S, Suciu VP (2012) Facial recognition using opencv. J Mobile Embed Distrib Syst 4(1):38–43

    Google Scholar 

  38. 38.

    Jain P, Pawar P, Koriya G, Lele A, Kumar A, Darbari H (2015) Knowledge acquisition for language description from scene understanding. In: 2015 international conference on computer, communication and control (IC4). IEEE, 2015, pp 1–6

  39. 39.

    Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2015, pp 3367–3375

  40. 40.

    Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2015, pp 922–928

  41. 41.

    Leger M, Quiedeville A, Bouet V, Haelewyn B, Boulouard M, Schumann-Bard P, Freret T (2013) Object recognition test in mice. Nat Protoc 8(12):2531

    Article  Google Scholar 

  42. 42.

    Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision. Springer 2012, pp 548–562

  43. 43.

    Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2017, pp 7263–7271

  44. 44.

    Tenorth M, Beetz M (2009) Knowrob–knowledge processing for autonomous personal robots. In: 2009 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2009, pp 4261–4266

  45. 45.

    Baddoura R, Venture G (2013) Social vs. useful hri: experiencing the familiar, perceiving the robot as a sociable partner and responding to its actions. Int J Soc Robot 5(4):529–547

    Article  Google Scholar 

  46. 46.

    Lang D, Friedmann S, Häselich M, Paulus D (2014) Definition of semantic maps for outdoor robotic tasks. In: 2014 IEEE International Conference on Robotics and Biomimetics (ROBIO 2014). IEEE, 2014, pp 2547–2552

  47. 47.

    Yan F, Nannapaneni S, He H (2019) Robotic scene understanding by using a dictionary. In: 2019 IEEE international conference on robotics and biomimetics (ROBIO). IEEE, 2019, pp 895–900

  48. 48.

    Yan F, Zhang Y, He H (2018) Semantics comprehension of entities in dictionary corpora for robot scene understanding. International Conference on Social Robotics. Springer 2018, pp 359–368

  49. 49.

    Lang D, Friedmann S, Hedrich J, Paulus D (2015) Semantic mapping for mobile outdoor robots. In: 2015 14th IAPR international conference on machine vision applications (MVA). IEEE, 2015, pp 325–328

  50. 50.

    Yang K, Bergasa LM, Romera E, Wang K (2019) Robustifying semantic cognition of traversability across wearable rgb-depth cameras. Appl Opt 58(12):3141–3155

    Article  Google Scholar 

  51. 51.

    Rani PJ, Bakthakumar J, Kumaar BP, Kumaar UP, Kumar S (2017) Voice controlled home automation system using natural language processing (nlp) and internet of things (iot). In: 2017 third international conference on science technology engineering & management (ICONSTEM). IEEE, 2017, pp 368–373

  52. 52.

    Nyga D, Balint-Benczedi F, Beetz M (2014) Pr2 looking at things—ensemble learning for unstructured information processing with markov logic networks. In: 2014 IEEE international conference on robotics and automation (ICRA). IEEE, 2014, pp 3916–3923

  53. 53.

    Stich SP (1975) Logical form and natural language. Philos Stud 28(6):397–418

    Article  Google Scholar 

  54. 54.

    O. U. Press (2010) Oxford Dictionary of English, O. U. Press, Ed. Oxford University Press

  55. 55.

    Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  56. 56.

    Prestes E, Carbonera JL, Fiorini SR, Jorge VA, Abel M, Madhavan R, Locoro A, Goncalves P, Barreto ME, Habib M et al (2013) Towards a core ontology for robotics and automation. Robot Auton Syst 61(11):1193–1204

    Article  Google Scholar 

  57. 57.

    Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer 2014, pp 740–755

  58. 58.

    Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  59. 59.

    Hinton GE, Salakhutdinov RR (2009) Replicated softmax: an undirected topic model. In: Advances in neural information processing systems 2009, pp 1607–1614

  60. 60.

    Schlenoff C, Prestes E, Madhavan R, Goncalves P, Li H, Balakirsky S, Kramer T, Miguelanez E (2012) An ieee standard ontology for robotics and automation. In: 2012 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2012, pp 1337–1342

  61. 61.

    Maedche A, Staab S (2001) Ontology learning for the semantic web. IEEE Intell Syst 16(2):72–79

    Article  Google Scholar 

  62. 62.

    Davies J, Fensel D, Van Harmelen F (2003) Towards the semantic web: ontology-driven knowledge management. Wiley, New York

    Google Scholar 

  63. 63.

    McGuinness DL, Van Harmelen F et al (2004) Owl web ontology language overview. W3C Recomm 10(10):2004

    Google Scholar 

  64. 64.

    Pot E, Monceaux J, Gelin R, Maisonnier B (2009) Choregraphe: a graphical tool for humanoid robot programming. In: RO-MAN 2009-The 18th IEEE international symposium on robot and human interactive communication. IEEE, 2009, pp 46–51

  65. 65.

    Olson DL, Delen D (2008) Advanced data mining techniques. Springer, Berlin

    Google Scholar 

Download references


This work has been supported by the Wichita Medical Research and Education Foundation and the Regional Institute on Aging (Grant No. 20,000).

Author information



Corresponding author

Correspondence to Hongsheng He.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yan, F., Tran, D.M. & He, H. Robotic Understanding of Object Semantics by Referringto a Dictionary. Int J of Soc Robotics (2020).

Download citation


  • Autonomous robots
  • Robot reasoning
  • Semantic comprehension
  • Human–robot interaction (HRI)
  • Scene comprehension