An Application of a Topic Model to Two Educational Assessments

  • Hye-Jeong ChoiEmail author
  • Minho Kwak
  • Seohyun Kim
  • Jiawei Xiong
  • Allan S. Cohen
  • Brian A. Bottge
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 265)


A topic model is a statistical model for extracting latent clusters or themes from the text in a collection of documents. The purpose of this study was to apply a topic model to two educational assessments. In the first study, the model was applied to students’ written responses to an extended response item on an English Language Arts (ELA) test. In the second study, a topic model was applied to the errors students’ made on a fractions computation test. The results for the first study showed five distinct writing patterns were detected in students’ writing on the ELA test. Two of the patterns were related to low scores, two patterns were associated with high scores and one pattern was unrelated to the score on the test. In the second study, five error patterns (i.e., latent topics) were detected on the pre-test and six error patterns were detected on the post-test for the fractions computation test. The results for Study 2 also yielded evidence of instructional effects on students’ fractions computation ability. Following instruction, more students in the experimental instruction condition made fewer errors than students in the business-as-usual condition.


Topic models Extended response items Error analysis 



The fractions computation data used in the article were collected with the following support: the U.S. Department of Education, Institute of Education Sciences, PR Number H324A090179.


  1. Bisgin, H., Liu, Z., Fang, H., Xu, X., & Tong, W. (2011, December). Mining FDA drug labels using an unsupervised learning technique-topic modeling. In BMC bioinformatics Vol. 12, No. 10, p. S11. BioMed Central.Google Scholar
  2. Blei, D. M., Ng, A. Y., Jordan, M., & I,. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.Google Scholar
  3. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.CrossRefGoogle Scholar
  4. Bottge, B. A., Ma, X., Gassaway, L., Toland, M., Butler, M., & Cho, S. J. (2014). Effects of blended instructional models on math performance. Exceptional Children, 80, 423–437.Google Scholar
  5. Brookhart, S. M. (2010). How to assess higher-order thinking skills in your classroom. Alexandria, VA: ASCD.Google Scholar
  6. Green, S. B., & Yang, Y. (2009). Reliability of Summed Item Scores Using Structural Equation Modeling: An Alternative to Coefficient Alpha. Psychometrika, 74, 155–167.MathSciNetCrossRefGoogle Scholar
  7. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228–5235.CrossRefGoogle Scholar
  8. Grimmer, J. (2010). A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases. Political Analysis, 18(1), 1–35.CrossRefGoogle Scholar
  9. Hornik, K., & Grün, B. (2011). topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13), 1–30.Google Scholar
  10. Hu, D. J. (2009). Latent dirichlet allocation for text, images, and music. University of California, San Diego. Retrieved November, 16, 2018, from
  11. Lau, J. H., Collier, N., & Baldwin, T. (2012). On-line trend analysis with topic models: # twitter trends detection topic model online. Proceedings of COLING, 2012, 1519–1534.Google Scholar
  12. Lauderdale, B. E., & Clark, T. S. (2014). Scaling politically meaningful dimensions using texts and votes. American Journal of Political Science, 58(3), 754–771.CrossRefGoogle Scholar
  13. Rhody, L. (2012). Topic modeling and figurative language. Journal of Digital Humanities, 2(1), pp. 19-35. Lauderdale, B. E., & Clark, T. S. (2014). Scaling politically meaningful dimensions using texts and votes. American Journal of Political Science, 58(3), pp. 754-771.Google Scholar
  14. Roberts, M. E., Stewart, B. M., & Tingley, D. (2018). stm: R Package for Structural Topic Models.
  15. Spiegelhalter, D. J., Best, N. G., & Carlin, B. P. (1998). Bayesian deviance, the effective number of parameters, and the comparison of arbitrarily complex models. Medical Research Council Biostatistics Unit, Cambridge, UK: Technical report.Google Scholar
  16. Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In T. Landauer, D. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis: A road to meaning. Hillsdale (pp. 427-448). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Hye-Jeong Choi
    • 1
    Email author
  • Minho Kwak
    • 1
  • Seohyun Kim
    • 1
  • Jiawei Xiong
    • 1
  • Allan S. Cohen
    • 1
  • Brian A. Bottge
    • 2
  1. 1.University of GeorgiaAthensUSA
  2. 2.University of KentuckyLexingtonUSA

Personalised recommendations