Abstract
The success rates of automated theorem provers in large theories highly depend on the choice of given facts. Premise selection is the task of choosing a subset of given facts, which is most likely to lead to a successful automated deduction proof of a given conjecture. Premise selection can be viewed as a multi-label classification problem, where machine learning from related proofs turns out to currently be the most successful method. Random forests are a machine learning technique known to perform especially well on large datasets. In this paper, we evaluate random forest algorithms for premise selection. To deal with the specifics of automated reasoning, we propose a number of extensions to random forests, such as incremental learning, multi-path querying, depth weighting, feature IDF (inverse document frequency), and integration of secondary classifiers in the tree leaves. With these extensions, we improve on the k-nearest neighbour algorithm both in terms of prediction quality and ATP performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Gupta, A., Prabhu, Y., Varma, M.: Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages. In: Proceedings of the 22nd International Conference on World Wide Web, WWW 2013, pp. 13–24 (2013)
Alama, J., Heskes, T., Kühlwein, D., Tsivtsivadze, E., Urban, J.: Premise selection for mathematics by corpus analysis and kernel methods. Journal of Automated Reasoning 52(2), 191–213 (2014)
Blanchette, J.C., Böhme, S., Paulson, L.C.: Extending sledgehammer with SMT solvers. In: Bjørner, N., Sofronie-Stokkermans, V. (eds.) CADE 2011. LNCS, vol. 6803, pp. 116–130. Springer, Heidelberg (2011)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Carlson, A.J., Cumby, C.M., Rosen, J.L., Roth, D.: SNoW user guide (1999)
Caruana, R., Niculescu-mizil, A.: An empirical comparison of supervised learning algorithms. In: 23rd Intl. Conf. Machine Learning (ICML 2006), pp. 161–168 (2006)
Fawcett, T.: ROC graphs: Notes and practical considerations for researchers. Technical report, HP Laboratories, March 2004
Harrison, J.: HOL Light: A tutorial introduction. In: Srivas, M., Camilleri, A. (eds.) FMCAD 1996. LNCS, vol. 1166, pp. 265–269. Springer, Heidelberg (1996)
Hoder, K., Voronkov, A.: Sine qua non for large theory reasoning. In: Bjørner, N., Sofronie-Stokkermans, V. (eds.) CADE 2011. LNCS, vol. 6803, pp. 299–314. Springer, Heidelberg (2011)
Kühlwein, D., Blanchette, J.C., Kaliszyk, C., Urban, J.: MaSh: Machine learning for sledgehammer. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 35–50. Springer, Heidelberg (2013)
Kaliszyk, C., Urban, J.: MizAR 40 for Mizar 40. CoRR (2013)
Kaliszyk, C., Urban, J.: Stronger automation for Flyspeck by feature weighting and strategy evolution. In: PxTP 2013. EPiC Series, vol. 14, pp. 87–95. EasyChair (2013)
Kaliszyk, C., Urban, J.: Learning-assisted automated reasoning with Flyspeck. Journal of Automated Reasoning 53(2), 173–213 (2014)
Kaliszyk, C., Urban, J.: HOL(y)Hammer: Online ATP service for HOL Light. Mathematics in Computer Science 9(1), 5–22 (2015)
Kovács, L., Voronkov, A.: First-order theorem proving and Vampire. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 1–35. Springer, Heidelberg (2013)
KĂĽhlwein, D.A.: Machine Learning for Automated Reasoning. PhD thesis, Radboud Universiteit Nijmegen, April 2014
Lakshminarayanan, B., Roy, D., Teh, Y.W.: c. In: Advances in Neural Information Processing Systems (2014)
Meng, J., Paulson, L.C.: Lightweight relevance filtering for machine-generated resolution problems. In: ESCoR: Empirically Successful Computerized Reasoning, pp. 53–69 (2006)
Naumowicz, A., Korniłowicz, A.: A brief overview of mizar. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 67–72. Springer, Heidelberg (2009)
Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL. LNCS, vol. 2283. Springer, Heidelberg (2002)
Oza, N.C., Russell, S.J.: Online bagging and boosting. In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, AISTATS 2001, January 4-7, vol. Key West, Florida, US (2001)
Raileanu, L.E., Stoffel, K.: Theoretical comparison between the Gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence 41(1), 77–93 (2004)
Schulz, S.: System description: E 1.8. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds.) LPAR-19 2013. LNCS, vol. 8312, pp. 735–743. Springer, Heidelberg (2013)
Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H.: On-line random forests. In: 3rd IEEE ICCV Workshop on On-line Computer Vision (2009)
Sorower, M.S.: A literature survey on algorithms for multi-label learning. Oregon State University, Corvallis (2010)
Sutcliffe, G.: The TPTP problem library and associated infrastructure. Journal of Automated Reasoning 43(4), 337–362 (2009)
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. Int. J. Data Warehousing and Mining, 1–13 (2007)
Urban, J.: MPTP - motivation, implementation, first experiments. J. Autom. Reasoning 33(3-4), 319–339 (2004)
Zhang, M.-L., Zhou, Z.-H.: A k-nearest neighbor based algorithm for multi-label classification. In: Proceedings of the 1st IEEE International Conference on Granular Computing (GrC 2005), Beijing, China, pp. 718–721 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Färber, M., Kaliszyk, C. (2015). Random Forests for Premise Selection. In: Lutz, C., Ranise, S. (eds) Frontiers of Combining Systems. FroCoS 2015. Lecture Notes in Computer Science(), vol 9322. Springer, Cham. https://doi.org/10.1007/978-3-319-24246-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-24246-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24245-3
Online ISBN: 978-3-319-24246-0
eBook Packages: Computer ScienceComputer Science (R0)