Abstract
Random forest (RF) is a tree-based learning method, which exhibits a high ability to generalize on real data sets. Nevertheless, a possible limitation of RF is that it generates a forest consisting of many trees and rules, thus it is viewed as a black box model. In this paper, the RF+HC methods for rule extraction from RF are proposed. Once the RF is built, a hill climbing algorithm is used to search for a rule set such that it reduces the number of rules dramatically, which significantly improves comprehensibility of the underlying model built by RF. The proposed methods are evaluated on eighteen UCI and four microarray data sets. Our experimental results show that the proposed methods outperform one of the state-of-the-art methods in terms of scalability and comprehensibility while preserving the same level of accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences 96(12), 6745–6750 (1999)
Bernard, S., Heutte, L., Adam, S.: On the selection of decision trees in random forests. In: International Joint Conference on Neural Networks, IJCNN 2009, pp. 302–307. IEEE (2009)
Blake, C., Keogh, E., Merz, C.J.: Uci repository of machine learning data bases MLRepository. html (1998). www.ics.uci.edu/mlearn
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 161–168. ACM (2006)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Díaz-Uriarte, R., Andres, S.A.D.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(1), 3 (2006)
Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Statistics and Computing 9(2), 123–143 (1999)
Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. The Annals of Applied Statistics, 916–954 (2008)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Huysmans, J., Baesens, B., Vanthienen, J.: Using rule extraction to improve the comprehensibility of predictive models. DTEW-KBI_0612, 1–55 (2006)
Johansson, U., Sonstrod, C., Lofstrom, T.: One tree to explain them all. In: 2011 IEEE Congress on Evolutionary Computation (CEC), pp. 1444–1451. IEEE (2011)
Latinne, P., Debeir, O., Decaestecker, C.: Limiting the number of trees in random forests. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 178–187. Springer, Heidelberg (2001)
Liu, S., Patel, R.Y., Daga, P.R., Liu, H., Fu, G., Doerksen, R., Chen, Y., Wilkins, D.: Multi-class joint rule extraction and feature selection for biological data. In: 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 476–481. IEEE (2011)
Liu, S., Patel, R.Y., Daga, P.R., Liu, H., Fu, G., Doerksen, R.J., Chen, Y., Wilkins, D.E.: Combined rule extraction and feature elimination in supervised classification. IEEE Transactions on NanoBioscience 11(3), 228–236 (2012)
Martinez-Muoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2), 245–259 (2009)
Meinshausen, N.: Node harvest. The Annals of Applied Statistics, 2049–2072 (2010)
Näppi, J.J., Regge, D., Yoshida, H.: Comparative performance of random forest and support vector machine classifiers for detection of colorectal lesions in ct colonography. In: Yoshida, H., Sakas, G., Linguraru, M.G. (eds.) Abdominal Imaging. LNCS, vol. 7029, pp. 27–34. Springer, Heidelberg (2012)
Nutt, C.L., Mani, D.R., Betensky, R.A., Pablo Tamayo, J., Cairncross, G., Ladd, C., Pohl, U., Hartmann, C., McLaughlin, M.E., Batchelor, T.T., et al.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research 63(7), 1602–1607 (2003)
Sarkar, B.K., Sana, S.S., Chaudhuri, K.: A genetic algorithm-based rule extraction system. Applied Soft Computing 12(1), 238–254 (2012)
Selman, B., Gomes, C.P.: Hill-climbing search. Encyclopedia of Cognitive Science (2006)
Shi, T., Horvath, S.: Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics 15(1) (2006)
Song, L., Langfelder, P., Horvath, S.: Random generalized linear model: a highly accurate and interpretable ensemble predictor. BMC Bioinformatics 14(1), 5 (2013)
Van Assche, A., Blockeel, H.: Seeing the forest through the trees: learning a comprehensible model from an ensemble. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 418–429. Springer, Heidelberg (2007)
Veer, L.J., Dai, H., Vijver, J.V.D., He, Y.D., Hart, A.A.M., Mao, M., Peterse, H.L., Kooy, K., Marton, M.J., Witteveen, A.T., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)
Yang, F., Wei-hang, L., Luo, L., Li, T.: Margin optimization based pruning for random forest. Neurocomputing 94, 54–63 (2012)
Zhang, H., Wang, M.: Search for the smallest random forest. Statistics and its Interface 2(3), 381 (2009)
Zhou, Z.-H., Jiang, Y., Chen, S.-F.: Extracting symbolic rules from trained neural network ensembles. Ai Communications 16(1), 3–15 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mashayekhi, M., Gras, R. (2015). Rule Extraction from Random Forest: the RF+HC Methods. In: Barbosa, D., Milios, E. (eds) Advances in Artificial Intelligence. Canadian AI 2015. Lecture Notes in Computer Science(), vol 9091. Springer, Cham. https://doi.org/10.1007/978-3-319-18356-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-18356-5_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18355-8
Online ISBN: 978-3-319-18356-5
eBook Packages: Computer ScienceComputer Science (R0)