Advertisement

The details matter: methodological nuances in the evaluation of student models

  • Radek Pelánek
Article
  • 42 Downloads

Abstract

The core of student modeling research is about capturing the complex learning processes into an abstract mathematical model. The student modeling research, however, also involves important methodological aspects. Some of these aspects may seem like technical details not worth significant attention. However, the details matter. We discuss three important methodological issues in student modeling: the impact of data collection, the splitting of data into a training set and a test set, and the details concerning averaging in the computation of predictive accuracy metrics. We explicitly identify decisions involved in these steps, illustrate how these decisions can influence results of experiments, and discuss consequences for future research in student modeling.

Keywords

Student modeling Evaluation Data Metrics Model comparison 

Notes

Acknowledgements

The author thanks members of the Adaptive Learning group at Masaryk University for interesting discussions about methodological issues in the evaluation of adaptive learning systems, particularly Jan Papoušek, Jiří Řihák and Juraj Nižnan, who performed some of the experiments on which the discussion is based.

References

  1. Baker, R.S.: Mining data for student models. In: Nkambou, R., Bourdeau, J., Mizoguchi, R. (eds.) Advances in Intelligent Tutoring Systems, pp. 323–337. Springer, Berlin (2010)CrossRefGoogle Scholar
  2. Baker, R.S., Corbett, A.T., Aleven, V.: More accurate student modeling through contextual estimation of slip and guess probabilities in Bayesian knowledge tracing. In: Proceedings of Intelligent Tutoring Systems, Springer, pp. 406–415 (2008)Google Scholar
  3. Baker, R.S., Gowda, S.M., Wixon, M., Kalka, J., Wagner, A.Z., Salvi, A., Aleven, V., Kusbit, G.W., Ocumpaugh, J., Rossi, L.: Towards sensor-free affect detection in cognitive tutor algebra. In: Proceedings of Educational Data Mining, ERIC (2012)Google Scholar
  4. Beck, J.: Difficulties in inferring student knowledge from observations (and why you should care). In: Proceedings of Educational Data Mining, pp. 21–30 (2007)Google Scholar
  5. Beck, J.E., Chang, Km.: Identifiability: A fundamental problem of student modeling. In: Proceedings of User Modeling, Springer, pp. 137–146 (2007)Google Scholar
  6. Beck, J.E., Xiong, X.: Limits to accuracy: how well can we do at student modeling. In: Proceedings of Educational Data Mining, pp. 4–11 (2013)Google Scholar
  7. Bergmeir, C., Benítez, J.M.: On the use of cross-validation for time series predictor evaluation. Inf. Sci. 191, 192–213 (2012)CrossRefGoogle Scholar
  8. Bottou, L., Peters, J., Quinonero-Candela, J., Charles, D.X., Chickering, D.M., Portugaly, E., Ray, D., Simard, P., Snelson, E.: Counterfactual reasoning and learning systems: the example of computational advertising. J. Mach. Learn. Res. 14(1), 3207–3260 (2013)MathSciNetMATHGoogle Scholar
  9. Brier, G.W.: Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78(1), 1–3 (1950)CrossRefGoogle Scholar
  10. Cook, J., Lynch, CF., Hicks, AG., Mostafavi, B.: Task and timing: separating procedural and tactical knowledge in student models. In: Proceedings of Educational Data Mining, pp. 186–191 (2017)Google Scholar
  11. Desmarais, M.C., Baker, R.S.: A review of recent advances in learner and skill modeling in intelligent learning environments. User Model. User Adapt. Interact. 22(1–2), 9–38 (2012)CrossRefGoogle Scholar
  12. Dhanani, A., Lee, S.Y., Phothilimthana, P., Pardos, Z.: A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing. EECS Department, University of California, Berkeley (2014). Tech. rep., Technical Report UCB/EECS-2014-131Google Scholar
  13. Diamantidis, N., Karlis, D., Giakoumakis, E.A.: Unsupervised stratification of cross-validation for accuracy estimation. Artif. Intell. 116(1–2), 1–16 (2000)MathSciNetCrossRefMATHGoogle Scholar
  14. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)CrossRefGoogle Scholar
  15. Doroudi, S., Brunskill, E.: The misidentified identifiability problem of Bayesian knowledge tracing. In: Proceedings of Educational Data Mining (2017)Google Scholar
  16. Fancsali, S.E., Nixon, T., Vuong, A., Ritter, S. Simulated students, mastery learning, and improved learning curves for real-world cognitive tutors. In: AIED Workshops Proceedings (2013)Google Scholar
  17. Fawcett, T.: ROC graphs: notes and practical considerations for researchers. Mach. Learn. 31(1), 1–38 (2004)MathSciNetGoogle Scholar
  18. Fawcett, T.: An introduction to roc analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar
  19. Fogarty, J., Baker, R.S., Hudson, S.E.: Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction. Proc. Graph. Interface 2005, 129–136 (2005)Google Scholar
  20. Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102(477), 359–378 (2007)MathSciNetCrossRefMATHGoogle Scholar
  21. Gong, Y., Beck, J.E., Heffernan, N.T.: Comparing knowledge tracing and performance factor analysis by using multiple model fitting procedures. In: Proceedings of Intelligent Tutoring Systems, Springer, pp. 35–44 (2010)Google Scholar
  22. Gong, Y., Beck, J.E., Heffernan, N.T.: How to construct more accurate student models: comparing and optimizing knowledge tracing and performance factor analysis. Int. J. Artif. Intell. Educ. 21(1–2), 27–46 (2011)Google Scholar
  23. González-Brenes, J., Huang, Y.: Your model is predictive - but is it useful? theoretical and empirical considerations of a new paradigm for adaptive tutoring evaluation. In: Proceedings of Educational Data Mining (2015)Google Scholar
  24. González-Brenes, J., Huang, Y., Brusilovsky, P.: General features in knowledge tracing: applications to multiple subskills, temporal item response theory, and expert knowledge. In: Proceedings of Educational Data Mining, pp. 84–91 (2014)Google Scholar
  25. González-Brenes, J.P.: Modeling skill acquisition over time with sequence and topic modeling. In: Proceedings of Artificial Intelligence and Statistics, pp. 296–305 (2015)Google Scholar
  26. González-Brenes, J.P., Mostow, J.: What and when do students learn? Fully data-driven joint estimation of cognitive and student models. In: Proceedings of Educational Data Mining, pp. 236–240 (2013)Google Scholar
  27. Gunawardana, A., Shani, G.: A survey of accuracy evaluation metrics of recommendation tasks. J. Mach. Learn. Res. 10, 2935–2962 (2009)MathSciNetMATHGoogle Scholar
  28. Hamill, T.M., Juras, J.: Measuring forecast skill: is it real skill or is it the varying climatology? Q. J. R. Meteorol. Soc. 132(621C), 2905–2923 (2006)CrossRefGoogle Scholar
  29. Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the roc curve. Mach. Learn. 77(1), 103–123 (2009)CrossRefGoogle Scholar
  30. Heathcote, A., Brown, S., Mewhort, D.: The power law repealed: the case for an exponential law of practice. Psychon. Bull. Rev. 7(2), 185–207 (2000)CrossRefGoogle Scholar
  31. Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22(1), 5–53 (2004)CrossRefGoogle Scholar
  32. Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice. OTexts, Melbourne (2014)Google Scholar
  33. Jarušek, P., Klusáček, M., Pelánek, R.: Modeling students’ learning and variability of performance in problem solving. In: Proceedings of Educational Data Mining, pp. 256–259 (2013)Google Scholar
  34. Käser, T., Klingler, S., Schwing, A.G., Gross, M.: Beyond knowledge tracing: modeling skill topologies with Bayesian networks. In: Proceedings of Intelligent Tutoring Systems, pp. 188–198 (2014a)Google Scholar
  35. Käser, T., Koedinger, K.R., Gross, M.: Different parameters—same prediction: an analysis of learning curves. In: Proceedings of Educational Data Mining, pp. 52–59 (2014b)Google Scholar
  36. Khajah, M., Lindsey, R.V., Mozer, M.C.: How deep is knowledge tracing? In: Proceedings of Educational Data Mining (2016)Google Scholar
  37. Khajah, M.M., Huang, Y., González-Brenes, J.P., Mozer, M.C., Brusilovsky, P.: Integrating knowledge tracing and item response theory: a tale of two frameworks. In: Proceedings of Personalization Approaches in Learning Environments (2014)Google Scholar
  38. Klingler, S., Käser, T., Solenthaler, B., Gross, M.: On the performance characteristics of latent-factor and knowledge tracing models. In: Proceedings of Educational Data Mining (2015)Google Scholar
  39. Koedinger, K.R., Baker, R.S., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J.: A data repository for the EDM community: the PSLC datashop. Handb. Educ. Data Min. 43, 43–56 (2010)CrossRefGoogle Scholar
  40. Koedinger, K.R., Corbett, A.T., Perfetti, C.: The knowledge-learning-instruction framework: bridging the science-practice chasm to enhance robust student learning. Cognit. Sci. 36(5), 757–798 (2012a)CrossRefGoogle Scholar
  41. Koedinger, K.R., McLaughlin, E.A., Stamper, J.C.: Automated student model improvement. International Educational Data Mining Society In: Proceedings of Educational Data Mining, pp. 17–24 (2012b)Google Scholar
  42. Koedinger, K.R., Yudelson, M.V., Pavlik, P.I.: Testing theories of transfer using error rate learning curves. Top. Cognit. Sci. 8(3), 589–609 (2016)CrossRefGoogle Scholar
  43. Langford, J., Strehl, A., Wortman, J.: Exploration scavenging. In: International Conference on Machine learning, ACM, pp. 528–535 (2008)Google Scholar
  44. Li, L., Chu, W., Langford, J., Wang, X.: Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In: Web search and data mining, ACM, pp. 297–306 (2011)Google Scholar
  45. Liu, R., Koedinger, KR.: Towards reliable and valid measurement of individualized student parameters. In: Proceedings of Educational Data Mining, pp. 135–142 (2017)Google Scholar
  46. Liu, R., Koedinger, K.R., McLaughlin, E.A.: Interpreting model discovery and testing generalization to a new dataset. In: Processing of Educational Data Mining, pp. 107–113 (2014)Google Scholar
  47. Lobo, J.M., Jiménez-Valverde, A., Real, R.: AUC: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr. 17(2), 145–151 (2008)CrossRefGoogle Scholar
  48. Lomas, D., Patel, K., Forlizzi, J.L., Koedinger, K.R.: Optimizing challenge in an educational game using large-scale design experiments. In: SIGCHI Conference on Human Factors in Computing Systems, ACM, pp. 89–98 (2013)Google Scholar
  49. Lopes, M., Clement, B., Roy, D., Oudeyer, P.Y.: Multi-armed bandits for intelligent tutoring systems. J. Educ. Data Min. 7(2), 20–48 (2015)Google Scholar
  50. Marlin, B.: Collaborative Filtering: A Machine Learning Perspective. University of Toronto, Toronto (2004)Google Scholar
  51. Martin, B., Mitrovic, A., Koedinger, K.R., Mathan, S.: Evaluating and improving adaptive educational systems with learning curves. User Model. User Adapt. Interact. 21(3), 249–283 (2011)CrossRefGoogle Scholar
  52. Marzban, C.: The roc curve and the area under it as performance measures. Weather Forecast. 19(6), 1106–1114 (2004)CrossRefGoogle Scholar
  53. Murphy, A.H.: A new vector partition of the probability score. J. Appl. Meteorol. 12(4), 595–600 (1973)CrossRefGoogle Scholar
  54. Murray, R.C., Ritter, S., Nixon, T., Schwiebert, R., Hausmann, R.G., Towle, B., Fancsali, S.E., Vuong, A.: Revealing the learning in learning curves. In: Proceedings of Artificial Intelligence in Education, Springer, Berlin, pp. 473–482 (2013)Google Scholar
  55. Nižnan, J., Pelánek, R., Papoušek, J.: Exploring the role of small differences in predictive accuracy using simulated data. In: Proceedings of AIED Workshop on Simulated Learners (2015)Google Scholar
  56. Nixon, T., Fancsali, S., Ritter, S.: The complex dynamics of aggregate learning curves. In: Proceedings of Educational Data Mining (2013)Google Scholar
  57. Niznan, J., Pelánek, R., Rihák, J.: Student models for prior knowledge estimation. In: Proceedings of Educational Data Mining, pp. 109–116 (2015)Google Scholar
  58. Papoušek, J., Pelánek, R.: Impact of adaptive educational system behaviour on student motivation. Proc. Artif. Intell. Educ. 9112, 348–357 (2015)CrossRefGoogle Scholar
  59. Papoušek, J., Pelánek, R., Stanislav, V.: Adaptive practice of facts in domains with varied prior knowledge. In: Proceedings of Educational Data Mining, pp. 6–13 (2014)Google Scholar
  60. Papoušek, J., Stanislav, V., Pelánek, R.: Evaluation of an adaptive practice system for learning geography facts. In: Gasevic, D., Lynch, G., Dawson, S., Drachsler, H., Rosé, C.P. (eds.) Proceedings of Learning Analytics and Knowledge, pp. 40–47. ACM, New York (2016)Google Scholar
  61. Paramythis, A., Weibelzahl, S., Masthoff, J.: Layered evaluation of interactive adaptive systems: framework and formative methods. User Model. User Adapt. Interact. 20(5), 383–453 (2010)CrossRefGoogle Scholar
  62. Pardos, Z.A., Heffernan, N.T.: Modeling individualization in a Bayesian networks implementation of knowledge tracing. In: Proceedings of User Modeling, Adaptation, and Personalization. Springer, Berlin, pp. 255–266 (2010)Google Scholar
  63. Pardos, Z.A., Heffernan, N.T.: Kt-idem: introducing item difficulty to the knowledge tracing model. In: Proceedings of User Modeling, Adaption and Personalization, Springer, Berlin, pp. 243–254 (2011)Google Scholar
  64. Pardos, Z.A., Yudelson, M.V.: Towards moment of learning accuracy. In: AIED 2013 Workshops Proceedings Volume 4 (2013)Google Scholar
  65. Pardos, Z.A., Gowda, S.M., Baker, R.S., Heffernan, N.T.: The sum is greater than the parts: ensembling models of student knowledge in educational software. ACM SIGKDD Explor. Newsl. 13(2), 37–44 (2012)CrossRefGoogle Scholar
  66. Pardos, Z.A., Bergner, Y., Seaton, D.T., Pritchard, D.E.: Adapting Bayesian knowledge tracing to a massive open online course in EDX. In: Proceedings of Educational Data Mining, pp. 137–144 (2013)Google Scholar
  67. Pelánek, R.: Metrics for evaluation of student models. J. Educ. Data Min. 7(2), 1–19 (2015)Google Scholar
  68. Pelánek, R.: Bayesian knowledge tracing, logistic models, and beyond: an overview of learner modeling techniques. User Model. User Adapt. Interact. 27(3), 313–350 (2017a)CrossRefGoogle Scholar
  69. Pelánek, R.: Measuring predictive performance of user models: the details matter. In: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, ACM, pp. 197–201 (2017b)Google Scholar
  70. Pelánek, R., Jarušek, P.: Student modeling based on problem solving times. Int. J. Artif. Intell. Educ. 25(4), 493–519 (2015)CrossRefGoogle Scholar
  71. Pelánek, R., Řihák, J.: Experimental analysis of mastery learning criteria. In: Proceedings of User Modelling, Adaptation and Personalization, ACM, pp. 156–163 (2017)Google Scholar
  72. Pelánek, R., Řihák, J., Papoušek, J.: Impact of data collection on interpretation and evaluation of student model. In: Proceedings of Learning Analytics and Knowledge, ACM, pp. 40–47 (2016)Google Scholar
  73. Piech, C., Bassen, J., Huang, J., Ganguli, S., Sahami, M., Guibas, LJ., Sohl-Dickstein, J.: Deep knowledge tracing. In: Advances in Neural Information Processing Systems, pp. 505–513 (2015)Google Scholar
  74. Reddy, S., Labutov, I., Banerjee, S., Joachims, T.: Unbounded human learning: Optimal scheduling for spaced repetition. In: Proceedings of Knowledge Discovery and Data Mining, ACM (2016)Google Scholar
  75. Ren, Z., Ning, X., Rangwala, H.: Grade prediction with temporal course-wise influence. In: Proceedings of Educational Data Mining, pp. 48–55 (2017)Google Scholar
  76. Sao Pedro, M., Baker, R.S., Gobert, J.D.: Incorporating scaffolding and tutor context into bayesian knowledge tracing to predict inquiry skill acquisition. In: Proceedings of Educational Data Mining, pp. 185–192 (2013a)Google Scholar
  77. Sao Pedro, M.A., Baker, R.S., Gobert, J.D.: What different kinds of stratification can reveal about the generalizability of data-mined skill assessment models. In: Proceedings of Learning Analytics and Knowledge, ACM, pp. 190–194 (2013b)Google Scholar
  78. Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, Springer, Berlin, pp. 257–297 (2011)Google Scholar
  79. Streeter, M.: Mixture modeling of individual learning curves. In: Proceedings of Educational Data Mining, pp. 45–52 (2015)Google Scholar
  80. Toth, Z., Talagrand, O., Candille, G., Zhu, Y.: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. Wiley. Probability and ensemble forecasts, pp. 137–163 (2003)Google Scholar
  81. Van Inwegen, E., Adjei, S., Wang, Y., Heffernan, N.: An analysis of the impact of action order on future performance: the fine-grain action model. In: Proceedings of Learning Analytics And Knowledge, ACM, pp. 320–324 (2015a)Google Scholar
  82. Van Inwegen, E.G., Adjei, S.A., Wang, Y., Heffernan, N.T.: Using partial credit and response history to model user knowledge. In: Proceedings of Educational Data Mining (2015b)Google Scholar
  83. Volkovs, M., Yu, G.W.: Effective latent models for binary feedback in recommender systems. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 313–322 (2015)Google Scholar
  84. Wager, S., Chamandy, N., Muralidharan, O., Najmi, A.: Feedback detection for live predictors. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3428–3436. Curran Associates, Inc, New York (2014)Google Scholar
  85. Wang, Y., Beck, J.: Class vs. student in a bayesian network student model. In: Proceedings of Artificial Intelligence in Education, Springer, Berlin, pp. 151–160 (2013)Google Scholar
  86. Wang, Y., Heffernan, N.: Extending knowledge tracing to allow partial credit: using continuous versus binary nodes. In: Proceedings of Artificial Intelligence in Education, Springer, Berlin, pp. 181–188 (2013)Google Scholar
  87. Wilson, K.H., Karklin, Y., Han, B., Ekanadham, C.: Back to the basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation. In: Processing of Educational Data Mining, pp. 539–544 (2016a)Google Scholar
  88. Wilson, K.H., Xiong, X., Khajah, M., Lindsey, R.V., Zhao, S., Karklin, Y., Van Inwegen, E.G., Han, B., Ekanadham, C., Beck, J.E., et al.: Estimating student proficiency: deep learning is not the panacea. In: Proceedings of Neural Information Processing Systems, Workshop on Machine Learning for Education (2016b)Google Scholar
  89. Xiong, X., Zhao, S., Van Inwegen, E., Beck, J.: Going deeper with deep knowledge tracing. In: Proceedings of Educational Data Mining, pp. 545–550 (2016)Google Scholar
  90. Yudelson, M.V., Koedinger, K.R.: Estimating the benefits of student model improvements on a substantive scale. In: EDM 2013 Workshops Proceedings (2013)Google Scholar
  91. Yudelson, M.V., Koedinger, K.R., Gordon, G.J.: Individualized Bayesian knowledge tracing models. In: Proceedings of Artificial Intelligence in Education, Springer, Berlin, pp. 171–180 (2013)Google Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  1. 1.Faculty of InformaticsMasaryk UniversityBrnoCzech Republic

Personalised recommendations