Semantic Active Visual Search System Based on Text Information for Large and Unknown Environments


Different high-level robotics tasks require the robot to manipulate or interact with objects that are in an unexplored part of the environment or not already in its field of view. Although much works rely on searching for objects based on their colour or 3D context, we argue that text information is a useful and functional visual cue to guide the search. In this paper, we study the problem of active visual search (AVS) in large unknown environments. In this paper, we present an AVS system that relies on semantic information inferred from texts found in the environment, which allows the robot to reduce the search costs by avoiding not promising regions for the target object. Our semantic planner reasons over the numbers detected from door signs to decide either perform a goal-directed exploration towards unknown parts of the environment or carefully search in the already known parts. We compared the performance of our semantic AVS system with two other search systems in four simulated environments. First, we developed a greedy search system that does not consider any semantic information, and second, we invited human participants to teleoperate the robot while performing the search. Our results from simulation and real-world experiments show that text is a promising source of information that provides different semantic cues for AVS systems.

Data Availability

The code, data and any other document will be release in the repository of the Phi Robotics Research Lab,Footnote 1 as soon as all the files meet the Google Style Guide and are well documented.


  1. 1.


  1. 1.

    Amherst, U: Room numbering guidelines (2012)

  2. 2.

    Aydemir, A., Göbelbecker, M., Pronobis, A., Sjöö, K., Jensfelt, P.: Plan-based object search and exploration using semantic spatial knowledge in the real world. In: ECMR, pp. 13–18 (2011)

  3. 3.

    Aydemir, A., Jensfelt, P.: Exploiting and modeling local 3d structure for predicting object locations. In: International Conference on Intelligent Robots and Systems, pp. 3885–3892. IEEE (2012)

  4. 4.

    Aydemir, A., Jensfelt, P., Folkesson, J.: What can we learn from 38,000 rooms? reasoning about unexplored space in indoor environments. In: International Conference on Intelligent Robots and Systems, pp. 4675–4682. IEEE (2012)

  5. 5.

    Aydemir, A., Pronobis, A., Göbelbecker, M., Jensfelt, P.: Active visual object search in unknown environments using uncertain semantics. In: Transactions on Robotics, vol. 29, pp. 986–1002. IEEE (2013)

  6. 6.

    Aydemir, A., Pronobis, A., Sjöö, K., Göbelbecker, M., Jensfelt, P.: Object search guided by semantic spatial knowledge. In: The RSS, vol. 11 (2011)

  7. 7.

    Aydemir, A., Sjöö, K., Folkesson, J., Pronobis, A., Jensfelt, P.: Search in the real world: Active visual object search based on spatial relations. In: International Conference on Robotics and Automation, pp. 2818–2824. IEEE (2011)

  8. 8.

    Barber, R., Crespo, J., Gomez, C., Hernandez, A.C., Galli, M.: Mobile robot navigation in indoor environments: Geometric, topological, and semantic navigation IntechOpen (2018)

  9. 9.

    Begum, M., Karray, F.: Visual attention for robotic cognition: A survey. In: Transactions on Autonomous Mental Development, vol. 3, pp. 92–105. IEEE (2010)

  10. 10.

    Borenstein, J., Koren, Y.: Histogramic in-motion mapping for mobile robot obstacle avoidance. In: Transactions on Robotics and Automation (1991)

  11. 11.

    Borji, A., Cheng, M.M., Hou, Q., Jiang, H., Li, J.: Salient object detection: A survey. In: Computational visual media, pp. 1–34. Springer (2019)

  12. 12.

    Chen, S., Li, Y., Kwok, N.M.: Active vision in robotic systems: A survey of recent developments. In: The International Journal of Robotics Research, vol. 30, pp. 1343–1377 (2011)

  13. 13.

    Chung, T.H., Hollinger, G.A., Isler, V.: Search and pursuit-evasion in mobile robotics. In: Autonomous robots, vol. 31, p. 299. Springer (2011)

  14. 14.

    DiCarlo, J.J., Zoccolan, D., Rust, N.C.: How does the brain solve visual object recognition?. In: Neuron, vol. 73, pp. 415–434. Elsevier (2012)

  15. 15.

    Ekvall, S., Kragic, D., Jensfelt, P.: Object detection and mapping for service robot tasks. In: Robotica, vol. 25, pp. 175–187 (2007)

  16. 16.

    Girdhar, Y., Whitney, D., Dudek, G.: Curiosity based exploration for learning terrain models. In: International Conference on Robotics and Automation. IEEE (2014)

  17. 17.

    Guo, Z., Hall, R.W.: Parallel thinning with two-subiteration algorithms. In: Communications of the ACM, vol. 32, pp. 359–373. ACM (1989)

  18. 18.

    Haines, B.: A basic model for numbering your rooms and spaces (2014)

  19. 19.

    Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: A survey. In: Pattern Recognition (2004)

  20. 20.

    Maffei, R., Jorge, V.A.M., Rey, V.F., Franco, G.S., Giambastiani, M., Barbosa, J., Kolberg, M., Prestes, E.: Using n-grams of spatial densities to construct maps. In: International Conference on Intelligent Robots and Systems (2015)

  21. 21.

    McKim: Hotel - Pennsylvania typical floor plan. (1919)

  22. 22.

    Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Conference on Computer Vision and Pattern Recognition (2012)

  23. 23.

    Prestes, E., Engel, P.M., Trevisan, M., Idiart, M.A.: Exploration method using harmonic functions. In: Robotics and Autonomous Systems (2002)

  24. 24.

    Quattrini Li, A., Cipolleschi, R., Giusto, M., Amigoni, F.: A semantically-informed multirobot system for exploration of relevant areas in search and rescue settings. In: Autonomous Robots (2016)

  25. 25.

    Rasouli, A., Lanillos, P., Cheng, G., Tsotsos, J.K.: Attention-based active visual search for mobile robots. In: Autonomous Robots, vol. 44, pp. 131–146. Springer (2020)

  26. 26.

    Rasouli, A., Tsotsos, J.K.: Integrating three mechanisms of visual attention for active visual search. ArXiv:1702.04292 (2017)

  27. 27.

    Rogers, J.G., Christensen, H.I.: Robot planning with a semantic map. In: International Conference on Robotics and Automation (2013)

  28. 28.

    Saidi, F., Stasse, O., Yokoi, K.: Active visual search by a humanoid robot. In: Robotics: Viable Robotic Service to Human, pp. 171–184. Springer (2007)

  29. 29.

    Schulz, R., Talbot, B., Lam, O., Dayoub, F., Corke, P., Upcroft, B., Wyeth, G.: Robot navigation using human cues: A robot navigation system for symbolic goal-directed exploration. In: International Conference on Robotics and Automation (2015)

  30. 30.

    Sjöö, K., Aydemir, A., Jensfelt, P.: Topological spatial relations for active visual search. In: Robotics and Autonomous Systems, vol. 60 (2012)

  31. 31.

    Sjöö, K., López, D.G., Paul, C., Jensfelt, P., Kragic, D.: Object search and localization for an indoor mobile robot. In: Journal of Computing and Information Technology, vol. 17. SRCE-University Computing Centre (2009)

  32. 32.

    Talbot, B., Lam, O., Schulz, R., Dayoub, F., Upcroft, B., Wyeth, G.: Find my office: Navigating real space from semantic descriptions. In: International Conference on Robotics and Automation (2016)

  33. 33.

    Tsotsos, J.K.: On the relative complexity of active vs. passive visual search. In: International journal of computer vision, vol. 7, pp. 127–141. Springer (1992)

  34. 34.

    University, S.: Room numbering guidelines (2017)

  35. 35.

    Veiga, T.S., Miraldo, P., Ventura, R., Lima, P.U.: Efficient object search for mobile robots in dynamic environments: Semantic map as an input for the decision maker. In: International Conference on Intelligent Robots and Systems (2016)

  36. 36.

    Ye, Y., Tsotsos, J.K.: On the collaborative object search team: A formulation. In: Distributed Artificial Intelligence Meets Machine Learning Learning in Multi-Agent Environments, pp. 94–116. Springer (1996)

  37. 37.

    Ye, Y., Tsotsos, J.K.: A complexity-level analysis of the sensor planning task for object search. In: Computational Intelligence, vol. 17, pp. 605–620 (2001)

  38. 38.

    Zeng, Z., Röfer, A., Jenkins, O.C.: Semantic linking maps for active visual object search. ArXiv:2006.10807 (2020)

  39. 39.

    Zhang, H., Zhao, K., Song, Y.Z., Guo, J.: Text extraction from natural scene image: A survey. In: Neurocomputing (2013)

Download references


This study was financed in part by the Coordenao de Aperfeioamento de Pessoal de Nvel Superior - Brazil (CAPES) - Finance Code 001, CNPq. Besides, this work is also partially supported by the Research Council of Norway (RCN) as a part of the COINMAC project (grant agreement 261645), the MECS project (grant agreement 247697) and the VIROS project (grant agreement 288285).

Author information




MM, DP, RM, EP and MK conceived and designed the approach. MM, and RM carried out the experiments, and the data analysis was performed by MM, DP, RM and MK. All authors wrote the manuscript and reviewed its final version.

Corresponding author

Correspondence to Mathias Mantelli.

Ethics declarations

Consent for Publication

The authors declare there is no conflict of interest in this paper.

Consent to Participate and to Publish

Before carrying out the experiments, all participants who have collaborated in the experiments of this work have signed a consent term. It contains an explanation about the experiment, its goal, and the use of the data generated by them. The term also contains information about the confidentiality and the security of their participation.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mantelli, M., Pittol, D., Maffei, R. et al. Semantic Active Visual Search System Based on Text Information for Large and Unknown Environments. J Intell Robot Syst 101, 32 (2021).

Download citation


  • Semantic information
  • Active search
  • Visual search problem