Grounding Verbs of Motion in Natural Language Commands to Robots

  • Thomas KollarEmail author
  • Stefanie Tellex
  • Deb Roy
  • Nicholas Roy
Part of the Springer Tracts in Advanced Robotics book series (STAR, volume 79)


To be useful teammates to human partners, robots must be able to follow spoken instructions given in natural language. An important class of instructions involve interacting with people, such as “Follow the person to the kitchen” or “Meet the person at the elevators.” These instructions require that the robot fluidly react to changes in the environment, not simply follow a pre-computed plan. We present an algorithm for understanding natural language commands with three components. First, we create a cost function that scores the language according to how well it matches a candidate plan in the environment, defined as the log-likelihood of the plan given the command. Components of the cost function include novel models for the meanings of motion verbs such as “follow,” “meet,” and “avoid,” as well as spatial relations such as “to” and landmark phrases such as “the kitchen.” Second, an inference method uses this cost function to perform forward search, finding a plan that matches the natural language command. Third, a high-level controller repeatedly calls the inference method at each timestep to compute a new plan in response to changes in the environment such as the movement of the human partner or other people in the scene. When a command consists of more than a single task, the controller switches to the next task when an earlier one is satisfied. We evaluate our approach on a set of example tasks that require the ability to follow both simple and complex natural language commands.


Cost Function Spatial Relation State Sequence Edit Distance Statistical Machine Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Wei, Y., Brunskill, E., Kollar, T., Roy, N.: Where to go: Interpreting natural directions using global inference. In: ICRA (2009)Google Scholar
  2. Kollar, T., Tellex, S., Roy, D., Roy, N.: Toward understanding natural language directions. In: Proceedings of HRI (2010)Google Scholar
  3. Matuszek, C., Fox, D., Koscher, K.: Following directions using statistical machine translation. In: Proceedings of HRI (2010)Google Scholar
  4. Shimizu, N., Haas, A.: Learning to follow navigational route instructions. In: IJCAI 2009: Proceedings of the 21st International Jont Conference on Artifical Intelligence, pp. 1488–1493. Morgan Kaufmann Publishers Inc., San Francisco (2009)Google Scholar
  5. MacMahon, M., Stankiewicz, B., Kuipers, B.: Walk the talk: Connecting language, knowledge, and action in route instructions. In: Proceedings of the National Conference on Artificial Intelligence, pp. 1475–1482 (2006)Google Scholar
  6. Vogel, A., Jurafsky, D.: Learning to follow navigational directions. In: ACL 2010: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Morristown, NJ, USA, pp. 806–814. Association for Computational Linguistics (2010)Google Scholar
  7. Hsiao, K.-Y., Tellex, S., Vosoughi, S., Kubat, R., Roy, D.: Object schemas for grounding language in a responsive robot. Connect. Sci. 20(4), 253–276 (2008)CrossRefGoogle Scholar
  8. Skubic, M., Perzanowski, D., Blisard, S., Schultz, A., Adams, W., Bugajska, M., Brock, D.: Spatial language for human-robot dialogs. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 34(2), 154–167 (2004) ISSN 1094-6977, doi:10.1109/TSMCC.2004.826273CrossRefGoogle Scholar
  9. Kruger, V., Kragic, D., Ude, A., Geib, C.: The meaning of action: A review on action recognition and mapping. Advanced Robotics 21(13) (2007)Google Scholar
  10. Chernova, S., Veloso, M.: Interactive policy learning through confidence-based autonomy. JAIR 34(1), 1–25 (2009)MathSciNetzbMATHGoogle Scholar
  11. Schaal, S., Ijspeert, A., Billard, A.: Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society B: Biological Sciences 358(1431), 537–547 (2003) ISSN 0962-8436, PMID: 12689379 PMCID: 1693137CrossRefGoogle Scholar
  12. Ekvall, S., Kragic, D.: Robot learning from demonstration: a task-level planning approach. International Journal of Advanced Robotic Systems 5(3) (2008)Google Scholar
  13. Nicolescu, M., Mataric, M.: Natural methods for robot task learning: instructive demonstrations, generalization and practice. In: Proc. AAMAS (2003)Google Scholar
  14. Rybski, P.E., Yoon, K., Stolarz, J., Veloso, M.M.: Interactive robot task training through dialog and demonstration. In: Proceedings of HRI, p. 56. ACM (2007)Google Scholar
  15. Abbeel, P., Ng, A.: Apprenticeship learning via inverse reinforcement learning. In: Proc. ICML (2004)Google Scholar
  16. Peters, J., Kober, J.: Using reward-weighted imitation for robot reinforcement learning. In: Proc. Inter. Symp. on Approximate Dynamic Programming and Reinforcement Learning (2009)Google Scholar
  17. Silver, D., Andrew Bagnell, J., Stentz, A.: Perceptual interpretation for autonomous navigation through dynamic imitation learning. In: Proc. ISRR (2009)Google Scholar
  18. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1-2) (1998)Google Scholar
  19. Toussaint, M., Storkey, A.: Probabilistic inference for solving discrete and continuous state Markov Decision Processes. In: Proceedings of the 23rd International Conference on Machine Learning, p. 952. ACM (2006)Google Scholar
  20. Attias, H.: Planning by probabilistic inference. In: Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics (2003)Google Scholar
  21. Kollar, T., Roy, N.: Utilizing object-object and object-scene context when planning to find things. In: IEEE International Conference on Robotics and Automation (2009)Google Scholar
  22. Grisetti, G., Stachniss, C., Burgard, W.: Improved techniques for grid mapping with Rao-Blackwellized particle filters. IEEE Transactions on Robotics 23(1), 34–46 (2007)CrossRefGoogle Scholar
  23. Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR 2008 (June 2008)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2014

Authors and Affiliations

  • Thomas Kollar
    • 1
    Email author
  • Stefanie Tellex
    • 2
  • Deb Roy
    • 2
  • Nicholas Roy
    • 1
  1. 1.Computer Science and Artificial Intelligence Lab.Massachusetts Institute of TechnologyCambridgeUSA
  2. 2.MIT Media Lab.CambridgeUSA

Personalised recommendations