Learning perceptually grounded word meanings from unaligned parallel data
- 767 Downloads
In order for robots to effectively understand natural language commands, they must be able to acquire meaning representations that can be mapped to perceptual features in the external world. Previous approaches to learning these grounded meaning representations require detailed annotations at training time. In this paper, we present an approach to grounded language acquisition which is capable of jointly learning a policy for following natural language commands such as “Pick up the tire pallet,” as well as a mapping between specific phrases in the language and aspects of the external world; for example the mapping between the words “the tire pallet” and a specific object in the environment. Our approach assumes a parametric form for the policy that the robot uses to choose actions in response to a natural language command that factors based on the structure of the language. We use a gradient method to optimize model parameters. Our evaluation demonstrates the effectiveness of the model on a corpus of commands given to a robotic forklift by untrained users.
KeywordsRobotics Language Machine learning Probabilistic graphical models
We would like to thank the anonymous reviewers for their insightful comments, which significantly shaped the paper. We would also like to thank Thomas Howard for his helpful comments on a draft of this paper, and Matthew R. Walter for his help in collecting the corpus. This work was sponsored by the Robotics Consortium of the U.S Army Research Laboratory under the Collaborative Technology Alliance Program, Cooperative Agreement W911NF-10-2-0016, and by the Office of Naval Research under MURIs N00014-07-1-0749 and MURI N00014-11-1-0688, and the DARPA BOLT program under contract HR0011-11-2-0008.
- Branavan, S. R. K., Chen, H., Zettlemoyer, L. S., & Barzilay, R. (2009). Reinforcement learning for mapping instructions to actions. In Proceedings of ACL (pp. 82–90). Google Scholar
- Chen, D. L., & Mooney, R. J. (2011). Learning to interpret natural language navigation instructions from observations. In Proceedings of AAAI. Google Scholar
- Clarke, J., Goldwasser, D., Chang, M., & Roth, D. (2010). Driving semantic parsing from the world’s response. In Proceedings of the fourteenth conference on computational natural language learning (pp. 18–27). New York: Association for Computational Linguistics. Google Scholar
- de Marneffe, M., MacCartney, B., & Manning, C. (2006). Generating typed dependency parses from phrase structure parses. In Proceedings of the international conference on language resources and evaluation (LREC), Genoa, Italy (pp. 449–454). Google Scholar
- Dzifcak, J., Scheutz, M., Baral, C., & Schermerhorn, P. (2009). What to do and how to do it: translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 4163–4168). Google Scholar
- Ekvall, S., & Kragic, D. (2008). Robot learning from demonstration: a task-level planning approach. International Journal of Advanced Robotic Systems, 5(3). Google Scholar
- Jackendoff, R. S. (1983). Semantics and cognition (pp. 161–187). Cambridge: MIT Press. Google Scholar
- Kollar, T., Tellex, S., Roy, D., & Roy, N. (2010). Toward understanding natural language directions. In Proceedings of the ACM/IEEE international conference on human-robot interaction (HRI) (pp. 259–266). Google Scholar
- Kruger, V., Kragic, D., Ude, A., & Geib, C. (2007). The meaning of action: a review on action recognition and mapping. Advanced Robotics, 21(13). Google Scholar
- Kwiatkowski, T., Zettlemoyer, L., Goldwater, S., & Steedman, M. (2010). Inducing probabilistic ccg grammars from logical form with higher-order unification. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 1223–1233). New York: Association for Computational Linguistics. Google Scholar
- Liang, P., Jordan, M. I., & Klein, D. (2011). Learning dependency-based compositional semantics. In Proceedings of the association for computational linguistics (ACL). Google Scholar
- MacMahon, M., Stankiewicz, B., & Kuipers, B. (2006). Walk the talk: connecting language, knowledge, and action in route instructions. In Proceedings of the national conference on artificial intelligence (AAAI) (pp. 1475–1482). Google Scholar
- Matuszek, C., Fox, D., & Koscher, K. (2010). Following directions using statistical machine translation. In Proceedings of the ACM/IEEE international conference on human-robot interaction (HRI) (pp. 251–258). Google Scholar
- Matuszek, C., FitzGerald, N., Zettlemoyer, L., Bo, L., & Fox, D. (2012a). A joint model of language and perception for grounded attribute learning. arXiv:1206.6423.
- Matuszek, C., Herbst, E., Zettlemoyer, L., & Fox, D. (2012b). Learning to parse natural language commands to a robot control system. In Proceedings of the 13th international symposium on experimental robotics (ISER). Google Scholar
- Mavridis, N., & Roy, D. (2006). Grounded situation models for robots: where words and percepts meet. In 2006 IEEE/RSJ international conference on intelligent robots and systems (pp. 4690–4697). New York: IEEE. Google Scholar
- McCallum, A. K. (2002). MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu.
- Piantadosi, S., Goodman, N., Ellis, B., & Tenenbaum, J. (2008). A Bayesian model of the acquisition of compositional semantics. In Proceedings of the thirtieth annual conference of the cognitive science society. Google Scholar
- Poon, H., & Domingos, P. (2009). Unsupervised semantic parsing. In Proceedings of the 2009 conference on empirical methods in natural language processing (Vol. 1, pp. 1–10). New York: Association for Computational Linguistics. Google Scholar
- Rybski, P., Yoon, K., Stolarz, J., & Veloso, M. (2007). Interactive robot task training through dialog and demonstration. In Proceedings of HRI (p. 56). New York: ACM. Google Scholar
- Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 358. Google Scholar
- Tellex, S., Kollar, T., Dickerson, S., Walter, M., Banerjee, A., Teller, S., & Roy, N. (2011). Understanding natural language commands for robotic navigation and mobile manipulation. In Proceedings of AAAI. Google Scholar
- Tellex, S., Thaker, P., Deits, R., Kollar, T., & Roy, N. (2012). Toward information theoretic human-robot dialog. In Proceedings of robotics: science and systems, Sydney, Australia, July 2012. Google Scholar
- Vogel, A., & Jurafsky, D. (2010). Learning to follow navigational directions. In Proceedings of the association for computational linguistics (ACL) (pp. 806–814). Google Scholar
- Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language. PhD thesis, Massachusetts Institute of Technology. Google Scholar
- Wong, Y., & Mooney, R. (2007). Learning synchronous grammars for semantic parsing with lambda calculus. In Association for computational linguistics (Vol. 45, p. 960). Google Scholar
- Zettlemoyer, L. S., & Collins, M. (2005). Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In UAI (pp. 658–666). Google Scholar