Classification trees with soft splits optimized for ranking

  • Jakub DvořákEmail author
Original Paper


We consider softening of splits in classification trees generated from multivariate numerical data. This methodology improves the quality of the ranking of the test cases measured by the AUC. Several ways to determine softening parameters are introduced and compared including softening algorithm present in the standard methods C4.5 and C5.0. In the first part of the paper, a few settings of softening determined only from ranges of training data in the tree branches are explored. The trees softened with these settings are used to study the effect of using the Laplace correction together with soft splits. In a later part we introduce methods which employ maximization of the classifier’s performance on the training set over the domain of the softening parameters. The non-linear optimization algorithm Nelder–Mead is used and various target functions are considered. The target function evaluating the AUC on the training set is compared with functions summing over training cases some transformation of the error of score. Several data sets from the UCI repository are used in experiments.


Supervised learning Decision trees Scoring classifier 



  1. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, MontereyzbMATHGoogle Scholar
  2. Carter C, Catlett J (1987) Assessing credit card applications using machine learning. IEEE Expert 2(3):71–79CrossRefGoogle Scholar
  3. Chen M, Ludwig SA (2013) Fuzzy decision tree using soft discretization and a genetic algorithm based feature selection method. In: 2013 World congress on nature and biologically inspired computing (NaBIC). IEEE, pp 238–244Google Scholar
  4. Clémençon S, Depecker M, Vayatis N (2013) Ranking forests. J Mach Learn Res 14(1):39–73MathSciNetzbMATHGoogle Scholar
  5. Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874MathSciNetCrossRefGoogle Scholar
  6. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36CrossRefGoogle Scholar
  7. Hüllermeier E, Vanderlooy S (2009) Why fuzzy decision trees are good rankers. Trans Fuzzy Syst 17(6):1233–1244CrossRefGoogle Scholar
  8. Janikow CZ, Kawa K (2005) Fuzzy decision tree FID. In: Proceedings of NAFIPS, pp 379–384Google Scholar
  9. Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214CrossRefGoogle Scholar
  10. Kumar GK, Viswanath P, Rao AA (2016) Ensemble of randomized soft decision trees for robust classification. Sādhanā 41(3):273–282MathSciNetzbMATHGoogle Scholar
  11. Leisch F, Dimitriadou E (2009) mlbench: Machine Learning Benchmark Problems. R package version 1.1-6Google Scholar
  12. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22Google Scholar
  13. Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Sciences, Irvine. Accessed 3 Feb 2016
  14. Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313MathSciNetCrossRefzbMATHGoogle Scholar
  15. Norouzi M, Collins MD, Johnson M, Fleet DJ, Kohli P (2015) Efficient non-greedy optimization of decision trees. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems. MIT Press Cambridge, pp 1729–1737Google Scholar
  16. Olaru C, Wehenkel L (2003) A complete fuzzy decision tree technique. Fuzzy Sets Syst 138(2):221–254MathSciNetCrossRefGoogle Scholar
  17. Otero FE, Freitas AA, Johnson CG (2012) Inducing decision trees with an ant colony optimization algorithm. Appl Soft Comput 12(11):3615–3626CrossRefGoogle Scholar
  18. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  19. Sofeikov KI, Tyukin IY, Gorban AN, Mirkes EM, Prokhorov DV, Romanenko IV (2014) Learning optimization for decision tree classification of non-categorical data with information gain impurity criterion. In: 2014 International joint conference on neural networks (IJCNN). IEEE, pp 3548–3555Google Scholar
  20. Suárez A, Lutsko JF (1999) Globally optimal fuzzy decision trees for classification and regression. IEEE Trans Pattern Anal Mach Intell 21:1297–1311CrossRefGoogle Scholar
  21. Yıldız OT, İrsoy O, Alpaydın E (2016) Bagging soft decision trees. In: Holzinger A (ed) Machine learning for health informatics: state-of-the-art and future challenges. Springer, Cham, pp 25–36Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of Computer ScienceAcademy of Sciences of the Czech RepublicPrague 8Czech Republic

Personalised recommendations