Rule Extraction from Random Forest: the RF+HC Methods

Mashayekhi, Morteza; Gras, Robin

doi:10.1007/978-3-319-18356-5_20

Morteza Mashayekhi⁶ &
Robin Gras⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9091))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

3008 Accesses
21 Citations

Abstract

Random forest (RF) is a tree-based learning method, which exhibits a high ability to generalize on real data sets. Nevertheless, a possible limitation of RF is that it generates a forest consisting of many trees and rules, thus it is viewed as a black box model. In this paper, the RF+HC methods for rule extraction from RF are proposed. Once the RF is built, a hill climbing algorithm is used to search for a rule set such that it reduces the number of rules dramatically, which significantly improves comprehensibility of the underlying model built by RF. The proposed methods are evaluated on eighteen UCI and four microarray data sets. Our experimental results show that the proposed methods outperform one of the state-of-the-art methods in terms of scalability and comprehensibility while preserving the same level of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences 96(12), 6745–6750 (1999)
Article Google Scholar
Bernard, S., Heutte, L., Adam, S.: On the selection of decision trees in random forests. In: International Joint Conference on Neural Networks, IJCNN 2009, pp. 302–307. IEEE (2009)
Google Scholar
Blake, C., Keogh, E., Merz, C.J.: Uci repository of machine learning data bases MLRepository. html (1998). www.ics.uci.edu/mlearn
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 161–168. ACM (2006)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MATH MathSciNet Google Scholar
Díaz-Uriarte, R., Andres, S.A.D.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(1), 3 (2006)
Article Google Scholar
Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Statistics and Computing 9(2), 123–143 (1999)
Article Google Scholar
Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. The Annals of Applied Statistics, 916–954 (2008)
Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Huysmans, J., Baesens, B., Vanthienen, J.: Using rule extraction to improve the comprehensibility of predictive models. DTEW-KBI_0612, 1–55 (2006)
Google Scholar
Johansson, U., Sonstrod, C., Lofstrom, T.: One tree to explain them all. In: 2011 IEEE Congress on Evolutionary Computation (CEC), pp. 1444–1451. IEEE (2011)
Google Scholar
Latinne, P., Debeir, O., Decaestecker, C.: Limiting the number of trees in random forests. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 178–187. Springer, Heidelberg (2001)
Chapter Google Scholar
Liu, S., Patel, R.Y., Daga, P.R., Liu, H., Fu, G., Doerksen, R., Chen, Y., Wilkins, D.: Multi-class joint rule extraction and feature selection for biological data. In: 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 476–481. IEEE (2011)
Google Scholar
Liu, S., Patel, R.Y., Daga, P.R., Liu, H., Fu, G., Doerksen, R.J., Chen, Y., Wilkins, D.E.: Combined rule extraction and feature elimination in supervised classification. IEEE Transactions on NanoBioscience 11(3), 228–236 (2012)
Article Google Scholar
Martinez-Muoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2), 245–259 (2009)
Article Google Scholar
Meinshausen, N.: Node harvest. The Annals of Applied Statistics, 2049–2072 (2010)
Google Scholar
Näppi, J.J., Regge, D., Yoshida, H.: Comparative performance of random forest and support vector machine classifiers for detection of colorectal lesions in ct colonography. In: Yoshida, H., Sakas, G., Linguraru, M.G. (eds.) Abdominal Imaging. LNCS, vol. 7029, pp. 27–34. Springer, Heidelberg (2012)
Google Scholar
Nutt, C.L., Mani, D.R., Betensky, R.A., Pablo Tamayo, J., Cairncross, G., Ladd, C., Pohl, U., Hartmann, C., McLaughlin, M.E., Batchelor, T.T., et al.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research 63(7), 1602–1607 (2003)
Google Scholar
Sarkar, B.K., Sana, S.S., Chaudhuri, K.: A genetic algorithm-based rule extraction system. Applied Soft Computing 12(1), 238–254 (2012)
Article Google Scholar
Selman, B., Gomes, C.P.: Hill-climbing search. Encyclopedia of Cognitive Science (2006)
Google Scholar
Shi, T., Horvath, S.: Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics 15(1) (2006)
Google Scholar
Song, L., Langfelder, P., Horvath, S.: Random generalized linear model: a highly accurate and interpretable ensemble predictor. BMC Bioinformatics 14(1), 5 (2013)
Article Google Scholar
Van Assche, A., Blockeel, H.: Seeing the forest through the trees: learning a comprehensible model from an ensemble. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 418–429. Springer, Heidelberg (2007)
Chapter Google Scholar
Veer, L.J., Dai, H., Vijver, J.V.D., He, Y.D., Hart, A.A.M., Mao, M., Peterse, H.L., Kooy, K., Marton, M.J., Witteveen, A.T., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)
Article Google Scholar
Yang, F., Wei-hang, L., Luo, L., Li, T.: Margin optimization based pruning for random forest. Neurocomputing 94, 54–63 (2012)
Article Google Scholar
Zhang, H., Wang, M.: Search for the smallest random forest. Statistics and its Interface 2(3), 381 (2009)
Article MATH MathSciNet Google Scholar
Zhou, Z.-H., Jiang, Y., Chen, S.-F.: Extracting symbolic rules from trained neural network ensembles. Ai Communications 16(1), 3–15 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Windsor, Windsor, ON, Canada
Morteza Mashayekhi & Robin Gras

Authors

Morteza Mashayekhi
View author publications
You can also search for this author in PubMed Google Scholar
Robin Gras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Morteza Mashayekhi .

Editor information

Editors and Affiliations

University of Alberta, Edmonton, Canada
Denilson Barbosa
Dalhousie University, Halifax, Canada
Evangelos Milios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mashayekhi, M., Gras, R. (2015). Rule Extraction from Random Forest: the RF+HC Methods. In: Barbosa, D., Milios, E. (eds) Advances in Artificial Intelligence. Canadian AI 2015. Lecture Notes in Computer Science(), vol 9091. Springer, Cham. https://doi.org/10.1007/978-3-319-18356-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-18356-5_20
Published: 29 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18355-8
Online ISBN: 978-3-319-18356-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics