Model tree pruning


A model tree is a decision tree in which a specified model, such as a linear regression or naive Bayes model, is built on part of the leaf nodes. Compared with the typical decision tree in which every leaf node is assigned a class label, a model tree has several advantages: the flexibility to handle mixed attributes, a simplified tree structure, and a good potential for processing big data. This paper investigates a model tree in which the ELM model is applied to some leaf nodes of the tree and compares two fundamental strategies for generating model trees in terms of training complexity and generalization ability, namely, prepruning and postpruning. The experimental results and algorithmic analysis show that, with respect to the ELM model tree, postpruning achieves better performance than does prepruning, which has previously been universally regarded as one of the most popular decision tree generation strategies.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    Frank E, Wang Y, Inglis S, Holmes G, Witten IH (1998) Using model trees for classification. Mach Learn 32(1):63–76

    Article  Google Scholar 

  2. 2.

    Quinlan J R (1992) Learning with continues classes. In: 5th Australian joint conference on artificial intelligence

  3. 3.

    Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27(3):221–234

    Article  Google Scholar 

  4. 4.

    Esposito F, Malerba D, Semeraro G et al (1997) A comparative analysis of methods for pruning decision trees. IEEE Trans Pattern Anal Mach Intell 19(5):476–491

    Article  Google Scholar 

  5. 5.

    Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  6. 6.

    Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90

    Article  Google Scholar 

  7. 7.

    Holte R C, Acker L E, Porter B W (1989) Concept learning and the problem of small disjuncts. In: International joint conference on artificial intelligence, pp 813–818

  8. 8.

    Niblett T (1987) Constructing decision trees in noisy domains. In: Proceedings of the second European working session on learning. Sigma Press, Wilmslow, England, pp 67–78

    Google Scholar 

  9. 9.

    Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27(3):221–234

    Article  Google Scholar 

  10. 10.

    Breslow LA, Aha DW (1997) Simplifying decision trees: a survey. Knowl Eng Rev 12(1):1–40

    Article  Google Scholar 

  11. 11.

    Niblett T, Bratko I (1986) Learning decision rules in noisy domains. In: Proceedings of expert systems’86. Cambridge University Press, Cambridge, pp 25–34

  12. 12.

    Cestnik B, Bratko I (1991) On estimating probabilities in tree pruning. In: Proceedings of European working sessions on learning. Springer, Porto, pp 138–150

  13. 13.

    Breiman L, Friedman J, Olshen RA et al (1984) Classification and regression trees. Wadsworth, Belmont, pp 1–358

    Google Scholar 

  14. 14.

    Nobel A (2002) Analysis of a complexity-based pruning scheme for classification trees. IEEE Trans Inf Theory 48(8):2362–2368

    MathSciNet  Article  Google Scholar 

  15. 15.

    Wang R, He YL, Chow CY et al (2015) Learning ELM-Tree from big data based on uncertainty reduction. Fuzzy Sets Syst 258(C):79–100

    MathSciNet  Article  Google Scholar 

  16. 16.

    Schmidt WF, Kraaijveld MA, Duin RPW (1992) Feedforward neural networks with random weights. In: Pattern recognition, 1992, vol II, conference B: pattern recognition methodology and systems, Proceedings, 11th IAPR international conference on. IEEE, pp 1–4

  17. 17.

    Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501

    Article  Google Scholar 

  18. 18.

    Lan Y, Soh YC, Huang GB (2010) Two-stage extreme learning machine for regression. Neurocomputing 73(16–18):3028–3038

    Article  Google Scholar 

  19. 19.

    Huang GB, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16–18):3460–3468

    Article  Google Scholar 

  20. 20.

    Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892

    Article  Google Scholar 

  21. 21.

    Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90

    Article  Google Scholar 

  22. 22.

    Gama J (2004) Functional trees. Mach Learn 55(3):219–250

    Article  Google Scholar 

  23. 23.

    ​Kohavi R (1996) Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). AAAI, Cambridge, pp 202–207

    Google Scholar 

  24. 24.

    Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59(1–2):161–205

    Article  Google Scholar 

  25. 25.

    Sumner M, Frank E, Hall M (2005) Speeding up logistic model tree induction. In: European conference on principles of data mining and knowledge discovery. Springer, Berlin, pp 675–683

    Google Scholar 

  26. 26.

    Witten IH, Frank E, Hall MA (2005) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann

  27. 27.

    UCI Machine Learning Repository. Available online:

  28. 28.

    Srivastava A, Han EH, Kumar V et al (1999) Parallel formulations of decision-tree classification algorithms. High performance data mining. Springer, Boston, pp 237–261

    Google Scholar 

  29. 29.

    Ben-Haim Y, Tom-Tov E (2010) A streaming parallel decision tree algorithm. J Mach Learn Res 11:849–872

    MathSciNet  MATH  Google Scholar 

  30. 30.

    Jin R, Agrawal G (2003) Communication and memory efficient parallel decision tree construction. In: Proceedings of the 2003 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 119–129

  31. 31.

    He Q, Shang T, Zhuang F et al (2013) Parallel extreme learning machine for regression based on MapReduce. Neurocomputing 102:52–58

    Article  Google Scholar 

  32. 32.

    Wang Y, Dou Y, Liu X et al (2016) PR-ELM: parallel regularized extreme learning machine based on cluster. Neurocomputing 173:1073–1081

    Article  Google Scholar 

Download references


We would like to express our gratitude to all those who helped me during the writing of this paper. We gratefully acknowledge the help of our supervisor, Prof. XiZhao Wang, who has offered us valuable suggestions to revise and improve this paper. This work was supported in part by the National Natural Science Foundation of China (Grant 61772344 and Grant 61732011), in part by the Natural Science Foundation of SZU (Grant 827-000140, Grant 827-000230 and Grant 2017060).

Author information



Corresponding author

Correspondence to Dasen Yan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhou, X., Yan, D. Model tree pruning. Int. J. Mach. Learn. & Cyber. 10, 3431–3444 (2019).

Download citation


  • Model tree
  • Pruning
  • Decision tree
  • Extreme learning machine
  • ELM-Tree