Advertisement

Using Local Rules in Random Forests of Decision Trees

  • Thanh-Nghi DoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9446)

Abstract

We propose to use local labeling rules in random forests of decision trees for effectively classifying data. The decision rules use the majority vote for labeling at terminal-nodes in decision trees, maybe making the classical random forest algorithm degrade the classification performance. Our investigation aims at replacing the majority rules with the local ones, i.e. support vector machines to improve the prediction correctness of decision forests. The numerical test results on 8 datasets from UCI repository and 2 benchmarks of handwritten letters recognition showed that our proposal is more accurate than the classical random forest algorithm.

Keywords

Decision trees Random forests Labeling rules Local rules Support vector machines (SVM) 

References

  1. 1.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.: Classification and Regression Trees. Wadsworth International, Belmont (1984)zbMATHGoogle Scholar
  2. 2.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  3. 3.
    Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007)CrossRefGoogle Scholar
  4. 4.
    Rokach, L., Maimon, O.Z.: Data Mining with Decision Trees: Theory and Applications, vol. 69. World Scientific Pub Co Inc, Singapore (2008)zbMATHGoogle Scholar
  5. 5.
    Cutler, A., Cutler, D.R., Stevens, J.R.: Tree-based methods. In: Li, X., Xu, R. (eds.) High-Dimensional Data Analysis in Cancer Research. Applied Bioinformatics and Biostatistics in Cancer Research, pp. 1–19. Springer, New York (2009)Google Scholar
  6. 6.
    Berry, M.J., Linoff, G.: Data Mining Techniques: For Marketing, Sales, and Customer Support. Wiley, New York (2011)Google Scholar
  7. 7.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)zbMATHCrossRefGoogle Scholar
  8. 8.
    Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds.): Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York (1994)zbMATHGoogle Scholar
  9. 9.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)zbMATHGoogle Scholar
  10. 10.
    Valiant, L.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)zbMATHCrossRefGoogle Scholar
  11. 11.
    Kearns, M., Valiant, L.G.: Learning boolean formulae or finite automata is as hard as factoring. Technical report TR 14–88, Harvard University Aiken Computation Laboratory (1988)Google Scholar
  12. 12.
    Kearns, M., Valiant, L.: Cryptographic limitations on learning boolean formulae and finite automata. J. ACM 41(1), 67–95 (1994)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Comput. 4(1), 1–58 (1992)CrossRefGoogle Scholar
  14. 14.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHGoogle Scholar
  15. 15.
    Domingos, P.: A unified bias-variance decomposition. In: Proceedings of 17th International Conference on Machine Learning, pp. 231–238. Morgan Kaufmann, Stanford, CA (2000)Google Scholar
  16. 16.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000) CrossRefGoogle Scholar
  17. 17.
    Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational Learning Theory: Proceedings of the Second European Conference, pp. 23–37 (1995)Google Scholar
  18. 18.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  19. 19.
    Asuncion, A., Newman, D.: UCI repository of machine learning databases (2007)Google Scholar
  20. 20.
    van der Maaten, L.: A new benchmark dataset for handwritten character recognition (2009). http://homepage.tudelft.nl/19j49/Publications_files/characters.zip
  21. 21.
    LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRefGoogle Scholar
  22. 22.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2009)zbMATHCrossRefGoogle Scholar
  23. 23.
    Breiman, L.: Arcing classifiers. Ann. Stat. 26(3), 801–849 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)CrossRefGoogle Scholar
  25. 25.
    Ho, T.K.: Random decision forest. In: Proceedings of the Third International Conference on Document Analysis and Recognition, pp. 278–282 (1995)Google Scholar
  26. 26.
    Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  27. 27.
    Vapnik, V.: Principles of risk minimization for learning theory. In: Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems, vol. 4, pp. 831–838. Morgan Kaufmann, San Mateo (1991)Google Scholar
  28. 28.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, New York (2000)zbMATHCrossRefGoogle Scholar
  29. 29.
    Weston, J., Watkins, C.: Support vector machines for multi-class pattern recognition. In: Proceedings of the Seventh European Symposium on Artificial Neural Networks, pp. 219–224 (1999)Google Scholar
  30. 30.
    Guermeur, Y.: Svm multiclasses, théorie et applications (2007)Google Scholar
  31. 31.
    Kreßel, U.: Pairwise classification and support vector machines. In: Smola, A., et al. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 255–268. MIT Press, Cambridge (1999)Google Scholar
  32. 32.
    Platt, J., Cristianini, N., Shawe-Taylor, J.: Large margin dags for multiclass classification. In: Solla, S.A., Leen, T.K., Müller, K. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 547–553. MIT Press, Cambridge (2000)Google Scholar
  33. 33.
    Vural, V., Dy, J.: A hierarchical method for multi-class support vector machines. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 831–838 (2004)Google Scholar
  34. 34.
    Benabdeslem, K., Bennani, Y.: Dendogram-based svm for multi-class classification. J. Comput. Inf. Technol. 14(4), 283–289 (2006)CrossRefGoogle Scholar
  35. 35.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(4), 1871–1874 (2008)zbMATHGoogle Scholar
  36. 36.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(27), 1–27 (2011)CrossRefGoogle Scholar
  37. 37.
    Vapnik, V., Bottou, L.: Local algorithms for pattern recognition and dependencies estimation. Neural Comput. 5(6), 893–909 (1993)CrossRefGoogle Scholar
  38. 38.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
  39. 39.
    Lin, C.: A practical guide to support vector classification (2003)Google Scholar
  40. 40.
    Murthy, S., Kasif, S., Salzberg, S., Beigel, R.: OC1: randomized induction of oblique decision trees. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 322–327 (1993)Google Scholar
  41. 41.
    Wu, W., Bennett, K., Cristianini, N., Shawe-Taylor, J.: Large margin trees for induction and transduction. In: Proceedings of the Sixth International Conference on Machine Learning, pp. 474–483 (1999)Google Scholar
  42. 42.
    Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers - a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 35(4), 476–487 (2005)CrossRefGoogle Scholar
  43. 43.
    Bennett, K.P., Mangasarian, O.L.: Multicategory discrimination via linear programming. Optim. Meth. Softw. 3, 27–39 (1994)CrossRefGoogle Scholar
  44. 44.
    Loh, W.Y., Vanichsetakul, N.: Tree-structured classification via generalized discriminant analysis (with discussion). J. Am. Stat. Assoc. 83, 715–728 (1988)zbMATHCrossRefGoogle Scholar
  45. 45.
    Yildiz, O., Alpaydin, E.: Linear discriminant trees. Int. J. Pattern Recogn. Artif. Intell. 19(3), 323–353 (2005)CrossRefGoogle Scholar
  46. 46.
    Cutler, A., Guohua, Z.: PERT - perfect random tree ensembles. Comput. Sci. Stat. 33, 490–497 (2001)Google Scholar
  47. 47.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)zbMATHCrossRefGoogle Scholar
  48. 48.
    Do, T.-N., Lenca, P., Lallich, S., Pham, N.-K.: Classifying very-high-dimensional data with random forests of oblique decision trees. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds.) Advances in Knowledge Discovery and Management. SCI, vol. 292, pp. 39–55. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  49. 49.
    Robnik-Sikonja, M.: Improving random forests. In: Proceedings of the Fifth European Conference on Machine Learning, pp. 359–370 (2004)Google Scholar
  50. 50.
    Friedman, J.H., Kohavi, R., Yun, Y.: Lazy decision trees. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence and Eighth Innovative Applications of ArtificialIntelligence Conference, AAAI 1996, IAAI 1996, vol. 1, pp. 717–724, Portland, Oregon, 4–8 Aug 1996Google Scholar
  51. 51.
    Kohavi, R., Kunz, C.: Option decision trees with majority votes. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 161–169, Nashville, Tennessee, USA, 8–12 Jul 1997Google Scholar
  52. 52.
    Marcellin, S., Zighed, D., Ritschard, G.: An asymmetric entropy measure for decision trees. In: IPMU 2006, Paris, France, pp. 1292–1299 (2006)Google Scholar
  53. 53.
    Lenca, P., Lallich, S., Do, T.-N., Pham, N.-K.: A comparison of different off-centered entropies to deal with class imbalance for decision trees. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 634–643. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  54. 54.
    Do, T., Lenca, P., Lallich, S.: Enhancing network intrusion classification through the kolmogorov-smirnov splitting criterion. In: ICTACS 2010, pp. 50–61, Vietnam (2010)Google Scholar
  55. 55.
    Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 202–207, Portland, Oregon, USA (1996)Google Scholar
  56. 56.
    Seewald, A.K., Petrak, J., Widmer, G.: Hybrid decision tree learners with alternative leaf classifiers: an empirical study. In: Inernational Florida Artificial Intelligence Research Society Conference, pp. 407–411 (2000)Google Scholar
  57. 57.
    Pham, N.K., Do, T.N., Lenca, P., Lallich, S.: Using local node information in decision trees: coupling a local decision rule with an off-centered. In: International Conference on Data Mining, pp. 117–123, Las Vegas, Nevada, USA, CSREA Press (2008)Google Scholar
  58. 58.
    Chang, F., Guo, C.Y., Lin, X.R., Lu, C.J.: Tree decomposition for large-scale SVM problems. J. Mach. Learn. Res. 11, 2935–2972 (2010)MathSciNetzbMATHGoogle Scholar
  59. 59.
    Ritschard, G., Marcellin, S., Zighed, D.A.: Arbre de décision pour données déséquilibrées : sur la complémentarité de l’intentisé d’implication et de l’entropie décentrée. In: Analyse Statistique Implicative - Une méthode d’analyse de données pour la recherche de causalités, pp. 207–222 (2009)Google Scholar
  60. 60.
    Lerman, I., Gras, R., Rostam, H.: Elaboration et évaluation d’un indice d’implication pour données binaires. Math. Sci. Hum. 74, 5–35 (1981)zbMATHGoogle Scholar
  61. 61.
    Geurts, P., Wehenkel, L., d’Alché Buc, F.: Kernelizing the output of tree-based methods. In: Cohen, W.W., Moore, A., (eds.) Proceedings of the Twenty-Third International Conference (ICML 2006), Pittsburgh, Pennsylvania, USA, 25–29 Jun 2006, ACM International Conference Proceeding Series, vol. 148, pp. 345–352, ACM (2006)Google Scholar
  62. 62.
    Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)CrossRefGoogle Scholar
  63. 63.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. B 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  64. 64.
    Bottou, L., Vapnik, V.: Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)CrossRefGoogle Scholar
  65. 65.
    Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 985–992. The MIT Press, Cambridge (2001)Google Scholar
  66. 66.
    Boley, D., Cao, D.: Training support vector machines using adaptive clustering. In: Berry, M.W., Dayal, U., Kamath, C., Skillicorn, D.B. (eds.) Proceedings of the Fourth SIAM International Conference on Data Mining, pp. 126–137, Lake Buena Vista, Florida, USA, 22–24 Apr 2004, SIAM (2004)Google Scholar
  67. 67.
    Zhang, H., Berg, A., Maire, M., Malik, J.: SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2126–2136 (2006)Google Scholar
  68. 68.
    Yang, T., Kecman, V.: Adaptive local hyperplane classification. Neurocomputing 71(1315), 3001–3004 (2008)CrossRefGoogle Scholar
  69. 69.
    Segata, N., Blanzieri, E.: Fast and scalable local kernel machines. J. Mach. Learn. Res. 11, 1883–1926 (2010)MathSciNetzbMATHGoogle Scholar
  70. 70.
    Cheng, H., Tan, P.N., Jin, R.: Efficient algorithm for localized support vector machine. IEEE Trans. Knowl. Data Eng. 22(4), 537–549 (2010)CrossRefGoogle Scholar
  71. 71.
    Kecman, V., Brooks, J.: Locally linear support vector machines and other local models. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–6 (2010)Google Scholar
  72. 72.
    Ladicky, L., Torr, P.H.S.: Locally linear support vector machines. In: Getoor, L., Scheffer, T., (eds.) Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 985–992, Bellevue, Washington, USA, Jun 28–Jul 2 2011, Omnipress (2011)Google Scholar
  73. 73.
    Gu, Q., Han, J.: Clustered support vector machines. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, AZ, USA, Apr 29–May 1 2013, JMLR Proceedings, vol. 31, pp. 307–315 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.College of Information TechnologyCan Tho UniversityCan ThoVietnam

Personalised recommendations