Machine Learning

, Volume 94, Issue 2, pp 151–167 | Cite as

Learning perceptually grounded word meanings from unaligned parallel data

  • Stefanie Tellex
  • Pratiksha Thaker
  • Joshua Joseph
  • Nicholas Roy


In order for robots to effectively understand natural language commands, they must be able to acquire meaning representations that can be mapped to perceptual features in the external world. Previous approaches to learning these grounded meaning representations require detailed annotations at training time. In this paper, we present an approach to grounded language acquisition which is capable of jointly learning a policy for following natural language commands such as “Pick up the tire pallet,” as well as a mapping between specific phrases in the language and aspects of the external world; for example the mapping between the words “the tire pallet” and a specific object in the environment. Our approach assumes a parametric form for the policy that the robot uses to choose actions in response to a natural language command that factors based on the structure of the language. We use a gradient method to optimize model parameters. Our evaluation demonstrates the effectiveness of the model on a corpus of commands given to a robotic forklift by untrained users.


Robotics Language Machine learning Probabilistic graphical models 



We would like to thank the anonymous reviewers for their insightful comments, which significantly shaped the paper. We would also like to thank Thomas Howard for his helpful comments on a draft of this paper, and Matthew R. Walter for his help in collecting the corpus. This work was sponsored by the Robotics Consortium of the U.S Army Research Laboratory under the Collaborative Technology Alliance Program, Cooperative Agreement W911NF-10-2-0016, and by the Office of Naval Research under MURIs N00014-07-1-0749 and MURI N00014-11-1-0688, and the DARPA BOLT program under contract HR0011-11-2-0008.


  1. Branavan, S. R. K., Chen, H., Zettlemoyer, L. S., & Barzilay, R. (2009). Reinforcement learning for mapping instructions to actions. In Proceedings of ACL (pp. 82–90). Google Scholar
  2. Chen, D. L., & Mooney, R. J. (2011). Learning to interpret natural language navigation instructions from observations. In Proceedings of AAAI. Google Scholar
  3. Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. The Journal of Artificial Intelligence Research, 34(1), 1–25. MathSciNetMATHGoogle Scholar
  4. Clarke, J., Goldwasser, D., Chang, M., & Roth, D. (2010). Driving semantic parsing from the world’s response. In Proceedings of the fourteenth conference on computational natural language learning (pp. 18–27). New York: Association for Computational Linguistics. Google Scholar
  5. de Marneffe, M., MacCartney, B., & Manning, C. (2006). Generating typed dependency parses from phrase structure parses. In Proceedings of the international conference on language resources and evaluation (LREC), Genoa, Italy (pp. 449–454). Google Scholar
  6. Dzifcak, J., Scheutz, M., Baral, C., & Schermerhorn, P. (2009). What to do and how to do it: translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 4163–4168). Google Scholar
  7. Ekvall, S., & Kragic, D. (2008). Robot learning from demonstration: a task-level planning approach. International Journal of Advanced Robotic Systems, 5(3). Google Scholar
  8. Hsiao, K., Tellex, S., Vosoughi, S., Kubat, R., & Roy, D. (2008). Object schemas for grounding language in a responsive robot. Connection Science, 20(4), 253–276. CrossRefGoogle Scholar
  9. Jackendoff, R. S. (1983). Semantics and cognition (pp. 161–187). Cambridge: MIT Press. Google Scholar
  10. Kollar, T., Tellex, S., Roy, D., & Roy, N. (2010). Toward understanding natural language directions. In Proceedings of the ACM/IEEE international conference on human-robot interaction (HRI) (pp. 259–266). Google Scholar
  11. Kruger, V., Kragic, D., Ude, A., & Geib, C. (2007). The meaning of action: a review on action recognition and mapping. Advanced Robotics, 21(13). Google Scholar
  12. Kwiatkowski, T., Zettlemoyer, L., Goldwater, S., & Steedman, M. (2010). Inducing probabilistic ccg grammars from logical form with higher-order unification. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 1223–1233). New York: Association for Computational Linguistics. Google Scholar
  13. Liang, P., Jordan, M. I., & Klein, D. (2011). Learning dependency-based compositional semantics. In Proceedings of the association for computational linguistics (ACL). Google Scholar
  14. MacMahon, M., Stankiewicz, B., & Kuipers, B. (2006). Walk the talk: connecting language, knowledge, and action in route instructions. In Proceedings of the national conference on artificial intelligence (AAAI) (pp. 1475–1482). Google Scholar
  15. Matuszek, C., Fox, D., & Koscher, K. (2010). Following directions using statistical machine translation. In Proceedings of the ACM/IEEE international conference on human-robot interaction (HRI) (pp. 251–258). Google Scholar
  16. Matuszek, C., FitzGerald, N., Zettlemoyer, L., Bo, L., & Fox, D. (2012a). A joint model of language and perception for grounded attribute learning. arXiv:1206.6423.
  17. Matuszek, C., Herbst, E., Zettlemoyer, L., & Fox, D. (2012b). Learning to parse natural language commands to a robot control system. In Proceedings of the 13th international symposium on experimental robotics (ISER). Google Scholar
  18. Mavridis, N., & Roy, D. (2006). Grounded situation models for robots: where words and percepts meet. In 2006 IEEE/RSJ international conference on intelligent robots and systems (pp. 4690–4697). New York: IEEE. Google Scholar
  19. McCallum, A. K. (2002). MALLET: a machine learning for language toolkit.
  20. Piantadosi, S., Goodman, N., Ellis, B., & Tenenbaum, J. (2008). A Bayesian model of the acquisition of compositional semantics. In Proceedings of the thirtieth annual conference of the cognitive science society. Google Scholar
  21. Poon, H., & Domingos, P. (2009). Unsupervised semantic parsing. In Proceedings of the 2009 conference on empirical methods in natural language processing (Vol. 1, pp. 1–10). New York: Association for Computational Linguistics. Google Scholar
  22. Rybski, P., Yoon, K., Stolarz, J., & Veloso, M. (2007). Interactive robot task training through dialog and demonstration. In Proceedings of HRI (p. 56). New York: ACM. Google Scholar
  23. Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 358. Google Scholar
  24. Skubic, M., Perzanowski, D., Blisard, S., Schultz, A., Adams, W., Bugajska, M., & Brock, D. (2004). Spatial language for human-robot dialogs. IEEE Transactions on Systems, Man and Cybernetics. Part C, Applications and Reviews, 34(2), 154–167. CrossRefGoogle Scholar
  25. Tellex, S., Kollar, T., Dickerson, S., Walter, M., Banerjee, A., Teller, S., & Roy, N. (2011). Understanding natural language commands for robotic navigation and mobile manipulation. In Proceedings of AAAI. Google Scholar
  26. Tellex, S., Thaker, P., Deits, R., Kollar, T., & Roy, N. (2012). Toward information theoretic human-robot dialog. In Proceedings of robotics: science and systems, Sydney, Australia, July 2012. Google Scholar
  27. Thompson, C. A., & Mooney, R. J. (2003). Acquiring word-meaning mappings for natural language interfaces. The Journal of Artificial Intelligence Research, 18, 1–44. MATHGoogle Scholar
  28. Vogel, A., & Jurafsky, D. (2010). Learning to follow navigational directions. In Proceedings of the association for computational linguistics (ACL) (pp. 806–814). Google Scholar
  29. Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language. PhD thesis, Massachusetts Institute of Technology. Google Scholar
  30. Wong, Y., & Mooney, R. (2007). Learning synchronous grammars for semantic parsing with lambda calculus. In Association for computational linguistics (Vol. 45, p. 960). Google Scholar
  31. Zettlemoyer, L. S., & Collins, M. (2005). Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In UAI (pp. 658–666). Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Stefanie Tellex
    • 1
  • Pratiksha Thaker
    • 1
  • Joshua Joseph
    • 1
  • Nicholas Roy
    • 1
  1. 1.CambridgeUSA

Personalised recommendations