Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies

  • Min Chi
  • Kurt VanLehn
  • Diane Litman
  • Pamela Jordan
Original Paper

Abstract

For many forms of e-learning environments, the system’s behavior can be viewed as a sequential decision process wherein, at each discrete step, the system is responsible for selecting the next action to take. Pedagogical strategies are policies to decide the next system action when there are multiple ones available. In this project we present a Reinforcement Learning (RL) approach for inducing effective pedagogical strategies and empirical evaluations of the induced strategies. This paper addresses the technical challenges in applying RL to Cordillera, a Natural Language Tutoring System teaching students introductory college physics. The algorithm chosen for this project is a model-based RL approach, Policy Iteration, and the training corpus for the RL approach is an exploratory corpus, which was collected by letting the system make random decisions when interacting with real students. Overall, our results show that by using a rather small training corpus, the RL-induced strategies indeed measurably improved the effectiveness of Cordillera in that the RL-induced policies improved students’ learning gains significantly.

Keywords

Reinforcement learning Pedagogical strategy Machine learning Human learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ai, H., Litman, D.J.: Knowledge consistent user simulations for dialog systems. In: Proceedings of Interspeech-2007, pp. 2697–2700, Antwerp, Belgium, 2007Google Scholar
  2. Aleven, V., Ogan, A., Popescu, O., Torrey, C., Koedinger, K.R.: Evaluating the effectiveness of a tutorial dialogue system for self-explanation. In: Lester, J.C., Vicari, R.M., Paraguaçu, F. (eds.) Intelligent Tutoring Systems, 7th International Conference, ITS 2004, vol. 3220 of Lecture Notes in Computer Science, pp. 443–454, Maceiò, Alagoas, Brazil, 30 August–3 September. Springer, Berlin (2004)Google Scholar
  3. Anderson J.R.: The Architecture of Cognition. Harvard University Press, Cambridge (1983)Google Scholar
  4. Anderson J.R., Corbett A.T., Koedinger K.R., Pelletier R.: Cognitive tutors: lessons learned. J. Learn. Sci. 4(2), 167–207 (1995)CrossRefGoogle Scholar
  5. Baker, R.S., Corbett, A.T., Koedinger, K.R.: Detecting student misuse of intelligent tutoring systems. In: Lester, J.C., Vicari, R.M., Paraguaçu, F. (eds.) Intelligent Tutoring Systems, 7th International Conference, ITS 2004, vol. 3220 of Lecture Notes in Computer Science, pp. 531–540, Maceiò, Alagoas, Brazil, 30 August–3 September. Springer, Berlin (2004a)Google Scholar
  6. Baker R.S., Corbett A.T., Koedinger K.R., Wagner A.Z.: Off-task behavior in the cognitive tutor classroom: when students “game the system”. In: Dykstra-Erickson, E., Tscheligi, M. (eds) CHI, pp. 383–390. ACM, New York (2004b)Google Scholar
  7. Barnes T., Stamper J.C.: Toward automatic hint generation for logic proof tutoring using historical student data. In: Woolf, B.P., Aïmeur, E., Nkambou, R., Lajoie, S.P. (eds) Intelligent Tutoring Systems, vol. 5091 of Lecture Notes in Computer Science, pp. 373–382. Springer, Berlin (2008)Google Scholar
  8. Beck, J., Woolf, B.P., Beal, C.R.: Advisor: a machine learning architecture for intelligent tutor construction. In: AAAI/IAAI, pp. 552–557. AAAI Press/The MIT Press, Menlo Park/Cambridge (2000)Google Scholar
  9. Bernsen N.O., Dybkjaer L.: Designing Interactive Speech Systems: From First Ideas to User Testing. Springer-Verlag New York Inc, Secaucus (1997)Google Scholar
  10. Chadés, I., Cros, M.-J., Garcia, F., Sabbadin, R.: Markov decision process (MDP) toolbox v2.0 for MATLAB (2005). http://www.inra.fr/internet/Departements/MIA/T/MDPtoolbox
  11. Chi, M.: Do Micro-level tutorial decisions matter: applying reinforcement learning to induce pedagogical tutorial tactics. PhD thesis, Intelligent Systems Program, University of Pittsburgh, December (2009)Google Scholar
  12. Chi M.T.H., de Leeuw N., Chiu M.-H., LaVancher C.: Eliciting self-explanations improves understanding. Cogn. Sci. 18(3), 439–477 (1994)Google Scholar
  13. Chi, M. Jordan, P.W., VanLehn, K., Hall, M.: Reinforcement learning-based feature selection for developing pedagogically effective tutorial dialogue tactics. In: de Baker, R.S.J., Barnes, T., Beck, J.E. (eds.) The 1st International Conference on Educational Data Mining (EDM), pp. 258–265. Montreal, Québec, Canada (2008). www.educationaldatamining.org
  14. Chi M., Jordan P.W., VanLehn K., Litman D.J.: To elicit or to tell: does it matter?. In: Dimitrova, V., Mizoguchi, R., du Boulay, B., Graesser, A.C. (eds) AIED, pp. 197–204. IOS Press, Amsterdam (2009)Google Scholar
  15. Chi M., VanLehn K., Litman D.J., Jordan P.W.: Inducing effective pedagogical strategies using learning context features. In: De Bra, P., Kobsa, A., Chin, D.N. (eds) UMAP, vol. 6075 of Lecture Notes in Computer Science, pp. 147–158. Springer, Berlin (2010)Google Scholar
  16. Collins A., Stevens A.: Goals and strategies for inquiry teachers. Adv. Instr. Psychol. 2, 65–119 (1982)Google Scholar
  17. Collins A., Brown J.S., Newman S.E.: Cognitive apprenticeship: teaching the craft of reading, writing and mathematics. In: Resnick, L.B. (eds) Knowing, learning and instruction: essays in honor of Robert Glaser, chap. 14, pp. 453–494. Lawrence Erlbaum Associates, Hillsdale (1989)Google Scholar
  18. Conati C., VanLehn K.: Toward computer-based support of meta-cognitive skills: a computational framework to coach self-explanation. Int. J. Artif. Intell. Educ. 11, 398–415 (2000)Google Scholar
  19. Corbett, A.T., Anderson, J.R.: Locus of feedback control in computer-based tutoring: impact on learning rate, achievement and attitudes. In: CHI, pp. 245–252, Seattle, Washington, USA, 2001Google Scholar
  20. D’Mello S.K., Graesser A.C.: Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Model. User-Adapt. Interact. 20(2), 147–187 (2010)CrossRefGoogle Scholar
  21. D’Mello S.K., Craig S.D., Witherspoon A.M., McDaniel B., Graesser A.C.: Automatic detection of learner’s affect from conversational cues. User Model. User-Adapt. Interact. 18(1–2), 45–80 (2008)CrossRefGoogle Scholar
  22. Forbes-Riley, K., Litman, D.J., Purandare, A., Rotaru, M., Tetreault, J.R.: Comparing linguistic features for modeling learning in computer tutoring. In: Luckin, R., Koedinger, K.R., Greer, J.E. (eds.): Artificial Intelligence in Education, Building Technology Rich Learning Contexts that Work, Proceedings of the 13th International Conference on Artificial Intelligence in Education, AIED 2007, vol. 158 of Frontiers in Artificial Intelligence and Applications, pp. 270–277, Los Angeles, California, USA, July 9–13. IOS Press, Amsterdam (2007)Google Scholar
  23. Frampton, M., Lemon, O.: Reinforcement learning of dialogue strategies using the user’s last dialogue act. In: Proceedings of the IJCAI Workshop on K&R in Practical Dialogue Systems, pp. 62–67 (2005)Google Scholar
  24. Frampton, M., Lemon, O.: Learning more effective dialogue strategies using limited dialogue move features. In: Calzolari, N., Cardie, C., Isabelle, P. (eds.) ACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, pp. 185–192, 17–21 July 2006. The Association for Computational Linguistics, Uppsala (2006)Google Scholar
  25. Graesser A.C., Person N.K., Magliano J.P.: Collaborative dialog patterns in naturalistic one-on-one tutoring. Appl. Cogn. Psychol. 9(6), 422–495 (1995)CrossRefGoogle Scholar
  26. Graesser A.C., VanLehn K., Rosé C.P., Jordan P.W., Harter D.: Intelligent tutoring systems with conversational dialogue. AI Magazine 22(4), 39–52 (2001)Google Scholar
  27. Hauskrecht, M.: Planning and control in stochastic domains with imperfect information. PhD thesis, MIT (1997) (Available as Technical Report: MIT-LCS-TR-738, 1997)Google Scholar
  28. Henderson, J., Lemon, O., Georgila, K.: Hybrid reinforcement/supervised learning for dialogue policies from communicator data. In: IJCAI Workshop on K&R in Practical Dialogue Systems, pp. 68–75, 2005Google Scholar
  29. Iglesias A., Martínez P., Fernández F.: An experience applying reinforcement learning in a web-based adaptive and intelligent educational system. Infor. Educ. 2(2), 223–240 (2003)Google Scholar
  30. Iglesias A., Martínez P., Aler R., Fernández F.: Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Appl. Intell. 31, 89–106 (2009a). doi:10.1007/s10489-008-0115-1 CrossRefGoogle Scholar
  31. Iglesias, A., Martínez, P., Aler, R., Fernández, F.: Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems. Knowledge-Based Syst. 22(4), 266–270 (2009b) (Artificial Intelligence (AI) in Blended Learning)Google Scholar
  32. Janarthanam, S., Lemon, O.: User simulations for online adaptation and knowledge-alignment in troubleshooting dialogue systems. In: Proceedings of LonDial the 12th SEMdial Workshop on the Semantics and Pragmatics of Dialogues, pp. 51–58, Stockholm, 2008Google Scholar
  33. Jolliffee I.T.: Principal Component Analysis, Springer Series in Statistics, 2nd edn. Springer, New York (2002)Google Scholar
  34. Jordan, P.W., Ringenberg, M.A., Hall, B.: Rapidly developing dialogue systems that support learning studies. In: ITS06 Workshop on Teaching with Robots, Agents and NLP, pp. 29–36 (2006). http://facweb.cs.depaul.edu/elulis/ITS2006RobotsAgentsWorkshop.html
  35. Jordan, P.W., Hall, B., Ringenberg, M., Cue, Y., Rosé, C.: Tools for authoring a dialogue agent that participates in learning studies. In: Luckin, R., Koedinger, K.R., Greer, J.E. (eds.) Artificial Intelligence in Education, Building Technology Rich Learning Contexts that Work, Proceedings of the 13th International Conference on Artificial Intelligence in Education, AIED 2007, vol. 158 of Frontiers in Artificial Intelligence and Applications, pp. 43–50, Los Angeles, CA, USA, July 9–13. IOS Press, Amsterdam (2007)Google Scholar
  36. Kaelbling L.P., Littman M.L., Moore A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)Google Scholar
  37. Katz S., O’Donnell G., Kay H.: An approach to analyzing the role and structure of reflective dialogue. Int. J. Artif. Intell. Educ. 11(3), 320–343 (2000)Google Scholar
  38. Koedinger K.R., Aleven V.: Exploring the assistance dilemma in experiments with cognitive tutors. Educ. Psychol. Rev 19(3), 239–264 (2007)CrossRefGoogle Scholar
  39. Koedinger K.R., Anderson J.R., Hadley W.H., Mark M.A.: Intelligent tutoring goes to school in the big city. Int. J. Artif. Intell. Educ. 8(1), 30–43 (1997)Google Scholar
  40. Levin, E., Pieraccini, R.: A stochastic model of computer–human interaction for learning dialogue strategies. In: EUROSPEECH 97, pp. 1883–1886, 1997Google Scholar
  41. Litman, D.J., Silliman, S.: Itspoke: an intelligent tutoring spoken dialogue system. In: Demonstration Papers at HLT-NAACL 2004, pp. 5–8. Association for Computational Linguistics, Morristown, NJ, USA (2004)Google Scholar
  42. Martin, K.N., Arroyo, I.: Agentx: using reinforcement learning to improve the effectiveness of intelligent tutoring systems. In: Lester, J.C., Vicari, R.M., Paraguaçu, F. (eds.) Intelligent Tutoring Systems, 7th International Conference, ITS 2004, vol. 3220 of Lecture Notes in Computer Science, pp. 564–572, Maceiò, Alagoas, Brazil, 30 August–3 September. Springer, Berlin (2004)Google Scholar
  43. McKendree J.: Effective feedback content for tutoring complex skills. Human–Computer Interaction 5(4), 381–413 (1990)CrossRefGoogle Scholar
  44. Moore J.D., Porayska-Pomsta K., Varges S., Zinn C.: Generating tutorial feedback with affect. In: Barr, V., Markov, Z. (eds) FLAIRS Conference, pp. 923–928. Menlo Park, (2004)Google Scholar
  45. Newell, A.: Unified Theories of Cognition, Reprint edition. Harvard University Press, Cambridge (1994)Google Scholar
  46. Paek, T., Chickering, D.: The Markov assumption in spoken dialogue management. In: 6th SIGDial Workshop on Discourse and Dialogue, pp. 35–44, 2005Google Scholar
  47. Pain, H., Porayska-Pomsta, K.: Affect in one-to-one tutoring. In: Ikeda, M., Ashley, K.D., Chan, T.-W. (eds.): Intelligent Tutoring Systems, 8th International Conference, ITS 2006, p. 817, Jhongli, Taiwan, 26–30 June 2006, Proceedings, vol. 4053 of Lecture Notes in Computer Science. Springer, Berlin (2006)Google Scholar
  48. Phobun, P., Vicheanpanya, J.: Adaptive intelligent tutoring systems for e-learning systems. Procedia Soc. Behav. Sci. 2(2), 4064–4069 (2010)Google Scholar
  49. Porayska-Pomsta K., Mavrikis M., Pain H.: Diagnosing and acting on student affect: the tutor’s perspective. User Model. User-Adapt. Interact. 18(1–2), 125–173 (2008)CrossRefGoogle Scholar
  50. Raux, A., Langner, B., Bohus, D., Black, A.W., Eskenazi, M.: Let’s go public! Taking a spoken dialog system to the real world. In: Proceedings of Interspeech (Eurospeech), pp. 885–888, Lisbon Portugal, 2005Google Scholar
  51. Rieser, V., Lemon, O.: Using machine learning to explore human multimodal clarification strategies. In: Calzolari, N., Cardie, C., Isabelle, P. (eds.) ACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, pp. 659–666, 17–21 July 2006. The Association for Computational Linguistics, Uppsala (2006)Google Scholar
  52. Ringenberg, M.A., VanLehn, K.: Scaffolding problem solving with annotated, worked-out examples to promote deep learning. In: Ikeda, M., Ashley, K.D., Chan, T.-W. (eds.): Intelligent Tutoring Systems, 8th International Conference, ITS 2006, pp. 625–636, Jhongli, Taiwan, 26–30 June 2006, Proceedings, vol. 4053 of Lecture Notes in Computer Science. Springer, Berlin (2006)Google Scholar
  53. Rudnicky, A., Thayer, E., Constantinides, P., Tchou, C., Shern, R., Lenzo, K., Xu, W., Oh, A.: Creating natural dialogs in the Carnegie Mellon communicator system. In: Proceedings of Eurospeech, vol. 4, pp. 1531–1534, 1999Google Scholar
  54. Singh S.P., Kearns, M.J., Litman, D.J., Walker, M.A.: Reinforcement learning for spoken dialogue systems. In: Solla S.A., Leen, T.K., Müller, K.-R. (eds.) NIPS, pp. 956–962. The MIT Press, Cambridge (1999)Google Scholar
  55. Singh S.P., Litman D.J., Kearns M.J., Walker M.A., Marilyn A.: Optimizing dialogue management with reinforcement learning: Experiments with the NJfun system. J. Aritif. Intell. Res. (JAIR) 16, 105–133 (2002)Google Scholar
  56. Stamper, J.C., Barnes, T., Croy, M.J.: Extracting student models for intelligent tutoring systems. In: AAAI, pp. 1900–1901, Vancouver, British Columbia, Canada, July 22–26. AAAI Press, Stanford, CA (2007)Google Scholar
  57. Sutton R.S., Barto A.G.: Reinforcement Learning. MIT Press Bradford Books, Cambridge (1998)Google Scholar
  58. Tetreault, J.R., Litman, D.J.: Comparing the utility of state features in spoken dialogue using reinforcement learning. In: Moore, R.C., Bilmes, J.A., Chu-Carroll, J., Sanderson, M. (eds.) Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pp. 272–279. The Association for Computational Linguistics, New York (2006a)Google Scholar
  59. Tetreault, J.R., Litman, D.J.: Using reinforcement learning to build a better model of dialogue state. In: Proceedings 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 289–296, Trento, Italy, 2006bGoogle Scholar
  60. Tetreault J.R., Bohus D., Litman D.J.: Estimating the reliability of MDP policies: a confidence interval approach. In: Sidner, C.L., Schultz, T., Stone, M., Zhai, C. (eds) HLT-NAACL, pp. 276–283. The Association for Computational Linguistics, Boston (2007)Google Scholar
  61. Tetreault J.R., Litman D.J.: A reinforcement learning approach to evaluating state representations in spoken dialogue systems. Speech Commun. 50(8–9), 683–696 (2008)CrossRefGoogle Scholar
  62. VanLehn K.: The behavior of tutoring systems. Int. J. Artif. Intell. Educ. 16(3), 227–265 (2006)Google Scholar
  63. VanLehn K., Graesser A.C., Jackson G.T., Jordan P.W., Olney A., Rosé C.P.: When are tutorial dialogues more effective than reading?. Cogn. Sci. 31(1), 3–62 (2007a)CrossRefGoogle Scholar
  64. VanLehn, K., Jordan, P., Litman, D.: Developing pedagogically effective tutorial dialogue tactics: Experiments and a testbed. In: Proceedings of SLaTE Workshop on Speech and Language Technology in Education ISCA Tutorial and Research Workshop, pp. 17–20, 2007bGoogle Scholar
  65. VanLehn K., Lynch C., Schulze K., Shapiro J.A., Shelby R., Taylor L., Treacy D., Weinstein A., Wintersgill M.: The andes physics tutoring system: lessons learned. Int. J. Artif. Intell. Educ. 15(3), 147–204 (2005)Google Scholar
  66. Vygotsky, L.S.: Interaction between learning and development. In: Mind and Society, pp. 79–91. Harvard University Press, Cambridge MA (1978)Google Scholar
  67. Walker M.A.: An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. J. Artif. Intell. Res. 12, 387–416 (2000)MATHGoogle Scholar
  68. Williams, J.D., Poupart, P., Young, S.J.: Factored partially observable Markov decision processes for dialogue management. In: 4th Workshop on Knowledge and Reasoning in Practical Dialog Systems, International Joint Conference on Artificial Intelligence (IJCAI), pp. 76–82, Edinburgh, 2005Google Scholar
  69. Williams J.D., Young S.: Partially observable Markov decision processes for spoken dialog systems. Comput. Speech Lang. 21(2), 231–422 (2007a)CrossRefGoogle Scholar
  70. Williams J.D., Young S.: Scaling POMDPs for spoken dialog management. IEEE Trans. Audio Speech Lang. Process. 15(7), 2116–2129 (2007b)CrossRefGoogle Scholar
  71. Wylie, R., Koedinger, K., Mitamura, T.: Is self-explanation always better? the effects of adding self-explanation prompts to an english grammar tutor. In: Proceedings of the 31st Annual Conference of the Cognitive Science Society, COGSCI 2009, pp. 1300–1305, Amsterdam, The Netherlands, 2009Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Min Chi
    • 1
  • Kurt VanLehn
    • 2
  • Diane Litman
    • 3
    • 4
  • Pamela Jordan
    • 4
  1. 1.Machine Learning DepartmentCarnegie Mellon UniversityPittsburghUSA
  2. 2.School of Computing, Informatics and Decision Science EngineeringArizona State UniversityTempeUSA
  3. 3.Department of Computer Science and Intelligent Systems ProgramUniversity of PittsburghPittsburghUSA
  4. 4.Learning Research and Development CenterUniversity of PittsburghPittsburghUSA

Personalised recommendations