Improving the Ranking Performance of Decision Trees

Wang, Bin; Zhang, Harry

doi:10.1007/11871842_44

Bin Wang²¹ &
Harry Zhang²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4212))

Included in the following conference series:

European Conference on Machine Learning

5538 Accesses
3 Citations

Abstract

An accurate ranking of instances based on their class probabilities, which is measured by AUC (area under the Receiver Operating Characteristics curve), is desired in many applications. In a traditional decision tree, two obstacles prevent it from yielding accurate rankings: one is that the sample size on a leaf is small, and the other is that the instances falling into the same leaf are assigned to the same class probability. In this paper, we propose two techniques to address these two issues. First, we use the statistical technique shrinkage which estimates the class probability of a test instance by using a linear interpolation of the local class probabilities on each node along the path from leaf to root. An efficient algorithm is also brought forward to learn the interpolating weights. Second, we introduce an instance-based method, the weighted probability estimation (WPE), to generate distinct local probability estimates for the test instances falling into the same leaf. The key idea is to assign different weights to training instances based on their similarities to the test instance in probability estimation. Furthermore, we combine shrinkage and WPE together to compensate for the defects of each. Our experiments show that both shrinkage and WPE improve the ranking performance of decision trees, and that their combination works even better. The experiments also indicate that various decision tree algorithms with the combination of shrinkage and WPE significantly outperform the original ones and other state-of-the-art techniques proposed to enhance the ranking performance of decision trees.

Download to read the full chapter text

Chapter PDF

Smooth estimation of the area under the ROC curve in multistage ranked set sampling

Article 06 January 2020

M. Mahdizadeh & Ehsan Zamanzade

Learning to improve medical decision making from imbalanced data without a priori cost

Article Open access 05 December 2014

Xiang Wan, Jiming Liu, … Tiejun Tong

Combining Ranking with Traditional Methods for Ordinal Class Imbalance

Keywords

References

Bahl, L., Brown, P., de Souza, P., Mercer, R.: A tree-based statistical language model for natural language speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 37, 1001–1008 (1989)
Article Google Scholar
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Artificial Intelligence 36, 105–142 (1989)
Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Article Google Scholar
Buntine, W.: Learning classification trees. In: Artificial Intelligence frontiers in statistics, pp. 182–201. Chapman & Hall, London (1993)
Google Scholar
Ferri, C., Flach, P.A., Hernandez-Orallo, J.: Improving the AUC of Probabilistic Estimation Trees. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS, vol. 2837, pp. 121–132. Springer, Heidelberg (2003)
Chapter Google Scholar
Hand, D.J., Till, R.J.: A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning 45, 171–186 (2001)
Article MATH Google Scholar
Hastie, T., Pregibon, L.: Shrinking Trees. AT & T Bell Laboratories (1990)
Google Scholar
Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. In: Proceedings of 18th International Conference on Artificial Intelligence (IJCAI 2003), pp. 329–341. Morgan Kaufmann, San Francisco (2003)
Google Scholar
Ling, C.X., Yan, R.J.: Decision tree with Better Ranking. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003). AAAI Press, Menlo Park (2003)
Google Scholar
McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning, pp. 359–367. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Bunk, C.: Reducing misclassification costs. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 217–225. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Provost, F., Domingos, P.: Tree Induction for Probability-based Ranking. In: Machine Learning. Kluwer Academic Publishers, Dordrecht (2002)
Google Scholar
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD 1997), pp. 43–48. AAAI Press, Menlo Park (1997)
Google Scholar
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Witten, I.H., Frank, E.: Data mining-practical machine learning tools and techniques with java implementation. Morgan Kaufmann, San Mateo (2000)
MATH Google Scholar
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and Naive Bayesian classifiers. In: Proceedings of the 18th International Conference on Machine Learning, pp. 609–616. Morgan Kaufmann, San Francisco (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, University of New Brunswick, Canada
Bin Wang & Harry Zhang

Authors

Bin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Harry Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, B., Zhang, H. (2006). Improving the Ranking Performance of Decision Trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_44

Download citation

DOI: https://doi.org/10.1007/11871842_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving the Ranking Performance of Decision Trees

Abstract

Chapter PDF

Similar content being viewed by others

Smooth estimation of the area under the ROC curve in multistage ranked set sampling

Learning to improve medical decision making from imbalanced data without a priori cost

Combining Ranking with Traditional Methods for Ordinal Class Imbalance

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving the Ranking Performance of Decision Trees

Abstract

Chapter PDF

Similar content being viewed by others

Smooth estimation of the area under the ROC curve in multistage ranked set sampling

Learning to improve medical decision making from imbalanced data without a priori cost

Combining Ranking with Traditional Methods for Ordinal Class Imbalance

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation