Skip to main content

Predicting Student Performance from Combined Data Sources

Part of the Studies in Computational Intelligence book series (SCI,volume 524)


This chapter will explore the use of predictive modeling methods for identifying students who will benefit most from tutor interventions. This is a growing area of research and is especially useful in distance learning where tutors and students do not meet face to face. The methods discussed will include decision-tree classification, support vector machine (SVM), general unary hypotheses automaton (GUHA), Bayesian networks, and linear and logistic regression. These methods have been trialed through building and testing predictive models using data from several Open University (OU) modules. The Open University offers a good test-bed for this work, as it is one of the largest distance learning institutions in Europe. The chapter will discuss how the predictive capacity of the different sources of data changes as the course progresses. It will also highlight the importance of understanding how a student’s pattern of behavior changes during the course.


  • Predictive modeling
  • Education
  • Virtual learning environment
  • Student outcome

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions



Analysis of variance


Course management system


Course signals


General unary hypotheses automaton


Massive open online course


Open university


Support vector machine


Tutor marked assessment


Virtual learning environment


  1. Kabra, R.R., Bichkar, R.S.: Performance prediction of engineering students using decision trees. Int. J. Comput. Appl. 36(11), 8–12 (2011)

    Google Scholar 

  2. Baradwaj, B., Pal, S.: Mining educational data to analyze student’s performance. Int. J. Adv. Comput. Sci. Appl. 2(6), 63–69 (2011)

    Google Scholar 

  3. Pandey, M., Sharma, V.K.: A decision tree algorithm pertaining to the student performance analysis and prediction. Int. J. Comput. Appl. 61(13), 1–5 (2013)

    Google Scholar 

  4. Baepler, P., Murdoch, C.J.: Academic analytics and data mining in higher education. Int. J. Sch. Teach. Learn. 4(2), 1–9 (2010)

    Google Scholar 

  5. Arnold, K.E., Pistilli, M.D.: Course signals at purdue: using learning analytics to increase student success. In: 2nd International Conference on Learning Analytics and Knowledge, pp. 267–270. ACM, New York (2012)

    Google Scholar 

  6. Pistilli, M.D., Arnold, K.E.: Purdue signals: mining real-time academic data to enhance student success. About Campus 15(3), 22–24 (2010)

    CrossRef  Google Scholar 

  7. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    CrossRef  Google Scholar 

  8. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  9. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    CrossRef  Google Scholar 

  10. Hájek, P., Holeňa, M., Rauch, J.: The GUHA method and its meaning for data mining. J. Comput. Syst. Sci. 76(1), 34–48 (2010)

    Google Scholar 

  11. Rauch, J.: GUHA method and the LISp-miner system. In: Observational Calculi and Association Rules. Studies of Computational Intelligence, vol. 469, pp. 233–260. Springer, Heidelberg (2013)

    Google Scholar 

  12. Koller, D., Friedman, F.: Probabilistic Graphical Models. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  13. Bishop, C. M.: A new framework for machine learning. In: Zurada, J.M., Yen, G.G., Wang, J. (eds.) Computational Intelligence: Research Frontiers, IEEE World Congress on Computational Intelligence. LNCS, vol. 5050, pp. 1–24. Springer, Heidelberg (2008)

    Google Scholar 

  14. Minka, T., Winn, J., Guiver, J., Knowles, D.: Infer.NET 2.5, Microsoft Research, Cambridge (2012)

    Google Scholar 

Download references


We would like to acknowledge the help and support of JISC and the contribution from Microsoft Research.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Annika Wolff .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Wolff, A., Zdrahal, Z., Herrmannova, D., Knoth, P. (2014). Predicting Student Performance from Combined Data Sources. In: Peña-Ayala, A. (eds) Educational Data Mining. Studies in Computational Intelligence, vol 524. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02737-1

  • Online ISBN: 978-3-319-02738-8

  • eBook Packages: EngineeringEngineering (R0)