Abstract
Predicting the failure of students in university courses can provide useful information for course and programme managers as well as to explain the drop out phenomenon. While it is important to have models at course level, their number makes it hard to extract knowledge that can be useful at the university level. Therefore, to support decision making at this level, it is important to generalize the knowledge contained in those models. We propose an approach to group and merge interpretable models in order to replace them with more general ones without compromising the quality of predictive performance. We evaluate our approach using data from the U. Porto. The results obtained are promising, although they suggest alternative approaches to the problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dekker, G., Pechenizkiy, M., Vleeshouwers, J.: Predicting students drop out: a case study. In: 2nd International Educational Data Mining Conference (EDM 2009), pp. 41–50 (2009)
Gorbunov, K.Y., Lyubetsky, V.A.: The tree nearest on average to a given set of trees. Problems of Information Transmission 47, 274–288 (2011)
Kargupta, H., Park, B.: A fourier spectrum-based approach to represent decision trees for mining data streams in mobile environments. IEEE Transactions on Knowledge and Data Engineering 16, 216–229 (2004)
Provost, F.J., Hennessy, D.N.: Distributed machine learning: scaling up with coarse-grained parallelism. In: Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology, vol. 2, pp. 340–347 (January 1994)
Provost, F., Hennessy, D.: Scaling up: Distributed machine learning with cooperation. In: Proceedings of the 13th National Conference on Artificial Intelligence, pp. 74–79 (1996)
Williams, G.J.: Inducing and Combining Multiple Decision Trees. PhD thesis, Australian National University (1990)
Andrzejak, A., Langner, F., Zabala, S.: Interpretable models from distributed data via merging of decision trees. In: 2013 IEEE Symposium on Computational Intelligence and Data Mining, CIDM (April 2013)
Hall, L., Chawla, N., Bowyer, K.: Combining decision trees learned in parallel. Working Notes of the KDD 1997 Workshop on Distributed Data Mining, pp. 10–15 (1998)
Hall, L., Chawla, N., Bowyer, K.: Decision tree learning on very large data sets. In: IEEE International Conference on Systems, Man, and Cybernetics, vol. 3, pp. 2579–2584 (1998)
Bursteinas, B., Long, J.: Merging distributed classifiers. In: 5th World Multiconference on Systemics, Cybernetics and Informatics (2001)
Kuhn, M., Weston, S., Coulter, N., Quinlan, R.: C50: C5.0 Decision Trees and Rule-Based Models. R package version 0.1.0-16 (2014)
Stone, M.: Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B 36(2), 111–147 (1974)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Conference on AI (IJCAI), pp. 1137–1145. Morgan Kaufmann, San Mateo (1995)
Chinchor, N.: MUC-4 Evaluation Metrics. In: Proceedings of the 4th Message Understanding Conference (MUC4 1992), pp. 22–29. Association for Computational Linguistics (1992)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Strecht, P., Mendes-Moreira, J., Soares, C. (2014). Merging Decision Trees: A Case Study in Predicting Student Performance. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-14717-8_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14716-1
Online ISBN: 978-3-319-14717-8
eBook Packages: Computer ScienceComputer Science (R0)