Identifying Feature Sequences from Process Data in Problem-Solving Items with N-Grams

  • Qiwei HeEmail author
  • Matthias von Davier
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 140)


This article draws on process data from a computer-based large-scale program, the Programme for International Assessment of Adult Competencies (PIAAC), to address how sequences of actions recorded in problem-solving tasks are related to task performance and how feature sequences are identified for different groups. The purpose of this study is twofold: first, to explore and detect action sequence patterns of features that are associated with success or failure on a problem-solving item, and second, to mutually validate the results derived from two feature selection models. Motivated by the methodologies of natural language processing and text mining, we utilized n-gram model and two feature selection methods, chi-square statistic (CHI), and weighted log likelihood ratio test (WLLR), in analyzing the process data at a variety of aggregate levels. It was found that action sequence patterns significantly differed by performance groups and were consistent across countries. The two feature selection approaches resulted in a high agreement of feature identification.


Process data Computer-based assessment N-gram Chi-square selection Weighted log likelihood ratio Problem-solving item 


  1. Agresti, A. (1990). Categorical data analysis. New York: Wiley.zbMATHGoogle Scholar
  2. Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  3. Dong, G., & Pei, J. (2007). Sequence data mining. New York: Springer.zbMATHGoogle Scholar
  4. Fink, G. A. (2008). Markov models for pattern recognition. Berlin, Germany: Springer.zbMATHGoogle Scholar
  5. Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289–1305.zbMATHGoogle Scholar
  6. Goldhammer, F., Naumann, J., Selter, A., Toth, K., Rolke, H., & Klieme, E. (2014). The time on task effect in reading and problem solving is moderated by task difficulty and skill: Insights from a computer-based large-scale assessment. Journal of Educational Psychology, 106(4), 608–626.CrossRefGoogle Scholar
  7. Goldhammer, F., Naumann, J., & Keβel, Y. (2013). Assessing individual differences in basic computer skills: Psychometric characteristics of an interactive performance measure. European Journal of Psychological Assessment, 29(4), 263–275.CrossRefGoogle Scholar
  8. Graesser, A. C., Lu, S., Jackson, G. T., Mitchell, H., Ventura, M., Olney, A., et al. (2004).AutoTutor: A tutor with dialogue in natural language. Behavioral Research Methods, Instruments, and Computers, 36, 180–193.CrossRefGoogle Scholar
  9. He, Q., Glas, C. A. W., Kosinski, M., Stillwell, D. J., & Veldkamp, B. P. (2014). Predicting self-monitoring skills using textual posts on Facebook. Computers in Human Behavior, 33, 69–78.CrossRefGoogle Scholar
  10. He, Q., Veldkamp, B. P., & de Vries, T. (2012). Screening for posttraumatic stress disorder using verbal features in self narratives: A text mining approach. Psychiatry Research, 198(3), 441–447.CrossRefGoogle Scholar
  11. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Machine Learning: ECML-98 Lecture Notes in Computer Science, 1398, 137–142.Google Scholar
  12. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22(1), 79–86.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Li, S., Xia, R., Zong, C., & Huang, C. (2009). A framework of feature selection methods for text categorization. In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP (pp. 692–700).Google Scholar
  14. Lin, J., & Wilbur, W. J. (2009). Modeling actions of PubMed users with n-gram language models. Information Retrieval, 12, 487–503.CrossRefGoogle Scholar
  15. Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  16. Nigam, K., McCallum, A. K., Thurn, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2-3), 103–134.CrossRefzbMATHGoogle Scholar
  17. Oakes, M., Gaizauskas, R., Fowkes, H., Jonsson, W. A. V., & Beaulieu, M. (2001). A method based on chi-square test for document classification. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 440–441). New York: ACM.Google Scholar
  18. Organisation for Economic Co-operation and Development. (2010). PIAAC technical standards and guidelines. Paris, France: Author.
  19. Organisation for Economic Co-operation and Development. (2013). Technical Report of the Survey of Adult Skills (PIAAC). Paris, France: Author.
  20. Rutkowski, L., Gonzalez, E., von Davier, M., & Zhou, Y. (2014). Assessment design for international large-scale assessments. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment (pp. 75–95). Boca Raton, FL: Taylor & Francis.Google Scholar
  21. Schleicher, A. (2008). PIAAC: A new strategy for assessing adult competencies. International Review of Education, 54, 627–650.CrossRefGoogle Scholar
  22. Sonamthiang, S., Cercone, N., & Naruedomkul, K. (2007). Discovering hierarchical patterns of students’ learning behavior in intelligent tutoring systems. In Proceedings of IEEE International Conference on Granular Computing (pp. 485–489).Google Scholar
  23. Spärck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.CrossRefGoogle Scholar
  24. Su, Z., Yang, Q., Lu, Y., & Zhang, H. (2000). What next: A prediction system for Web requests using n-gram sequence models. In Proceedings of the First International Conference on Web Information Systems Engineering (Vol. 1, pp. 214–221).Google Scholar
  25. Sukkarieh, J. Z., von Davier, M., & Yamamoto, K. (2012). From biology to education: Scoring and clustering multilingual text sequences and other sequential tasks (Research Report No. RR-12-25). Princeton, NJ: Educational Testing Service.Google Scholar
  26. von Davier, M., & Sinharay, S. (2014). Analytics in international large-scale assessments: Item response theory and population models. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment (pp. 155–174). Boca Raton, FL: Taylor & Francis.Google Scholar
  27. von Davier, M., Sinharay, S., Oranje, A., & Beaton, A. (2006). Statistical procedures used in the National Assessment of Educational Progress (NAEP): Recent developments and future directions. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26): Psychometrics. Amsterdam, Netherlands: Elsevier.Google Scholar
  28. Yang, Y., & Pederson, J. O. (1997). A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (pp. 412–420).Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Research and Development DepartmentGlobal Assessment, Educational Testing ServicePrincetonUSA

Personalised recommendations