Applied Intelligence

, Volume 49, Issue 3, pp 1127–1145 | Cite as

Improving lazy decision tree for imbalanced classification by using skew-insensitive criteria

  • Chong Su
  • Jie CaoEmail author


Lazy decision tree (LazyDT) constructs a customized decision tree for each test instance, which consists of only a single path from the root to a leaf node. LazyDT has two strengths in comparison with eager decision trees. One is that LazyDT can build shorter decision paths than eager decision trees, and the other is that LazyDT can avoid unnecessary data fragmentation. However, the split criterion used for constructing a customized tree in LazyDT is information gain, which is skew-sensitive. When learning from imbalanced data sets, class imbalance impedes their ability to learn the minority class concept. In this paper, we use Hellinger distance and K-L divergence as split criteria to build two types of lazy decision trees. An experimental framework is performed across a wide range of imbalanced data sets to investigate the effectiveness of our methods when comparing with the other methods including lazy decision tree, C4.5, Hellinger distance based decision tree and support vector machine. In addition, we also use SMOTE to preprocess the highly imbalance data sets in the experiment and evaluate its effectiveness. The experimental results, which contrasted through nonparametric statistical tests, demonstrate that using Hellinger distance and K-L divergence as the split criterion can improve the performances of LazyDT for imbalanced classification effectively.


Imbalanced learning Lazy decision tree Hellinger distance K-L divergence SMOTE 



We would like to acknowledge support for this project from China Postdoctoral Science Foundation (2016M600430), the National Social Science Foundation of China (16ZDA054), Jiangsu Provincial 333 Project (BRA2017396), Six Major Talents PeakProject of Jiangsu Province (XYDXXJS-CXTD-005) and Philosophy and social science in colleges and universities in Jiangsu Province outstanding innovation team (2015ZSTD006). The authors also would like to express our gratitude to the donors of the different data sets and the maintainers of the KEEL Data set Repository.


  1. 1.
    Quinlan JR (2014) C4.5: Programs for machine learning. Elsevier, AmsterdamGoogle Scholar
  2. 2.
    Friedman JH, Kohavi R, Yun Y (1996) Lazy decision trees. AAAI/IAAI 1:717–724Google Scholar
  3. 3.
    Bagallo G, Haussler D (1990) Boolean feature discovery in empirical learning. Mach Learn 5(1):71–99Google Scholar
  4. 4.
    Mahmoudi N, Duman E (2015) Detecting credit card fraud by modified fisher discriminant analysis. Expert Syst Appl 42(5):2510–2516Google Scholar
  5. 5.
    Khor KC, Ting CY, Phon-Amnuaisuk S (2014) The effectiveness of sampling methods for the imbalanced network intrusion detection data set. Recent Advances on Soft Computing and Data Mining. Springer, Cham, pp 613–622Google Scholar
  6. 6.
    Wan X, Liu J, Cheung WK (2014) Learning to improve medical decision making from imbalanced data without a priori cost. BMC medical informatics and decision making 14(1):111Google Scholar
  7. 7.
    He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 12(9):1263–1284Google Scholar
  8. 8.
    López V, Fernández A, García S (2014) An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141Google Scholar
  9. 9.
    Krawczyk B (2016) Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence 5(4):221–232Google Scholar
  10. 10.
    Chawla NV, Bowyer KW, Hall LO (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16:321–357zbMATHGoogle Scholar
  11. 11.
    Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. International Conference on Intelligent Computing. Springer, Berlin, Heidelberg, pp 878–887Google Scholar
  12. 12.
    He H, Bai Y, Garcia EA (2008) ADASYN: Adaptive Synthetic sampling approach for imbalanced learning. IEEE International Joint Conference on Neural Networks, pp 1322–1328Google Scholar
  13. 13.
    Hu S, Liang Y, Ma L (2009) MSMOTE: Improving classification performance when training data is imbalanced. IEEE 2nd International Workshop on Computer Science and Engineering, pp 13–17Google Scholar
  14. 14.
    Barua S, Islam MM, Yao X (2014) MWMOTE–Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425Google Scholar
  15. 15.
    Zhou P, Hu X, Li P (2017) Online feature selection for high-dimensional class-imbalanced data. Knowl-Based Syst 136:187–199Google Scholar
  16. 16.
    Wu G, Chang EY (2005) KBA: Kernel Boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795Google Scholar
  17. 17.
    Xu Y (2017) Maximum margin of twin spheres support vector machine for imbalanced data classification. IEEE Trans Cybern 47(6):1540–1550Google Scholar
  18. 18.
    Xu Y, Wang Q, Pang X (2018) Maximum margin of twin spheres machine with pinball loss for imbalanced data classification. Appl Intell 48(1):23–34Google Scholar
  19. 19.
    Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. IEEE Symposium on Computational Intelligence and Data Mining, pp 324–331Google Scholar
  20. 20.
    Chawla NV, Lazarevic A, Hall LO (2003) SMOTEBOost: Improving prediction of the minority class in boosting. European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, pp 107–119Google Scholar
  21. 21.
    Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B (Cybernetics) 39(2):539–550Google Scholar
  22. 22.
    Longadge R, Dongre S (2013) Class imbalance problem in data mining review. International Journal of Computer Science and Network 1305:1707Google Scholar
  23. 23.
    Zhang Z, Krawczyk B, Garcìa S (2016) Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl-Based Syst 106:251–263Google Scholar
  24. 24.
    Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, pp 241–256Google Scholar
  25. 25.
    Cieslak DA, Hoens TR, Chawla NV (2012) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Disc 24(1):136–158MathSciNetzbMATHGoogle Scholar
  26. 26.
    Hoens TR, Qian Q, Chawla NV (2012) Building decision trees for the multi-class imbalance problem. Pacific-asia Conference on Knowledge Discovery and Data Mining. Springer, Berlin, Heidelberg, pp 122–134Google Scholar
  27. 27.
    Lyon RJ, Brooke JM, Knowles JD (2014) Hellinger distance trees for imbalanced streams. IEEE International Conference on Pattern Recognition, pp 1969–1974Google Scholar
  28. 28.
    Chawla NV, Cieslak DA, Hall LO (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Disc 17(2):225–252MathSciNetGoogle Scholar
  29. 29.
    Zhang H (2012) Lazy decision tree method for distributed privacy preserving data mining. International Journal of Advancements in Computing Technology 4(14):458–465Google Scholar
  30. 30.
    Quinlan JR (1996) Bagging, boosting, and c4.5. AAAI/IAAI 1:725–730Google Scholar
  31. 31.
    Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach Learn 40(2):139–157Google Scholar
  32. 32.
    Fern XZ, Brodley CE (2003) Boosting lazy decision trees. In: Proceedings of the 20th International Conference on Machine Learning ICML, pp 178–185Google Scholar
  33. 33.
    Guillame-Bert M, Dubrawski A (2016) Batched Lazy Decision Trees. arXiv:1603.02578
  34. 34.
    Rao CR (1995) A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. Qű,estiió 19(1):23–63MathSciNetzbMATHGoogle Scholar
  35. 35.
    Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86MathSciNetzbMATHGoogle Scholar
  36. 36.
    Christopher DM, Prabhakar R, Hinrich S (2008) Introduction to information retrieval. An Introduction To Information Retrieval 151(177):5zbMATHGoogle Scholar
  37. 37.
    Triguero I, González S, Moyano JM (2017) KEEL 3.0: An open source software for multi-stage analysis in data mining. International Journal of Computational Intelligence Systems 10(1):1238–1249Google Scholar
  38. 38.
    Chawla NV (2003) C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the ICML, 3:66Google Scholar
  39. 39.
    Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  40. 40.
    Raeder T, Forman G, Chawla NV (2012) Learning from imbalanced data: evaluation matters. Data mining: Foundations and intelligent paradigms. Springer, Berlin, Heidelberg, pp 315–331Google Scholar
  41. 41.
    Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186zbMATHGoogle Scholar
  42. 42.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30MathSciNetzbMATHGoogle Scholar
  43. 43.
    García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9(Dec):2677–2694zbMATHGoogle Scholar
  44. 44.
    Van Den Bosch A, Weijters A, Van Den Herik HJ (1997) When small disjuncts abound, try lazy learning: A case study. Proceedings of the Seventh Belgian-Dutch Conference on Machine Learning, pp 109–118Google Scholar
  45. 45.
    Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. ACM Proceedings of the 23rd international conference on Machine learning, pp 161–168Google Scholar
  46. 46.
    Fernández-Delgado M, Cernadas E, Barro S (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181MathSciNetzbMATHGoogle Scholar
  47. 47.
    Banfield RE, Hall LO, Bowyer KW (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180Google Scholar
  48. 48.
    Zhou L, Fujita H (2017) Posterior probability based ensemble strategy using optimizing decision directed acyclic graph for multi-class classification. Inf Sci 400:142–156Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Key Laboratory of Meteorological Disaster, Ministry of Education (KLME), Joint International Reseach Laboratory of Climate and Environment Change (ILCEC), Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters (CIC-FEMD), School of Information and ControlNanjing University of Information Science and TechnologyNanjingChina
  2. 2.NanjingChina
  3. 3.School of Mathematical and StatisticsNanjing University of Information Science and TechnologyNanjingChina

Personalised recommendations