Random Forests for Premise Selection

Färber, Michael; Kaliszyk, Cezary

doi:10.1007/978-3-319-24246-0_20

Michael Färber¹⁵ &
Cezary Kaliszyk¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9322))

Included in the following conference series:

International Symposium on Frontiers of Combining Systems

401 Accesses
3 Citations

Abstract

The success rates of automated theorem provers in large theories highly depend on the choice of given facts. Premise selection is the task of choosing a subset of given facts, which is most likely to lead to a successful automated deduction proof of a given conjecture. Premise selection can be viewed as a multi-label classification problem, where machine learning from related proofs turns out to currently be the most successful method. Random forests are a machine learning technique known to perform especially well on large datasets. In this paper, we evaluate random forest algorithms for premise selection. To deal with the specifics of automated reasoning, we propose a number of extensions to random forests, such as incremental learning, multi-path querying, depth weighting, feature IDF (inverse document frequency), and integration of secondary classifiers in the tree leaves. With these extensions, we improve on the k-nearest neighbour algorithm both in terms of prediction quality and ATP performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Gupta, A., Prabhu, Y., Varma, M.: Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages. In: Proceedings of the 22nd International Conference on World Wide Web, WWW 2013, pp. 13–24 (2013)
Google Scholar
Alama, J., Heskes, T., Kühlwein, D., Tsivtsivadze, E., Urban, J.: Premise selection for mathematics by corpus analysis and kernel methods. Journal of Automated Reasoning 52(2), 191–213 (2014)
Article MathSciNet MATH Google Scholar
Blanchette, J.C., Böhme, S., Paulson, L.C.: Extending sledgehammer with SMT solvers. In: Bjørner, N., Sofronie-Stokkermans, V. (eds.) CADE 2011. LNCS, vol. 6803, pp. 116–130. Springer, Heidelberg (2011)
Chapter Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Carlson, A.J., Cumby, C.M., Rosen, J.L., Roth, D.: SNoW user guide (1999)
Google Scholar
Caruana, R., Niculescu-mizil, A.: An empirical comparison of supervised learning algorithms. In: 23rd Intl. Conf. Machine Learning (ICML 2006), pp. 161–168 (2006)
Google Scholar
Fawcett, T.: ROC graphs: Notes and practical considerations for researchers. Technical report, HP Laboratories, March 2004
Google Scholar
Harrison, J.: HOL Light: A tutorial introduction. In: Srivas, M., Camilleri, A. (eds.) FMCAD 1996. LNCS, vol. 1166, pp. 265–269. Springer, Heidelberg (1996)
Chapter Google Scholar
Hoder, K., Voronkov, A.: Sine qua non for large theory reasoning. In: Bjørner, N., Sofronie-Stokkermans, V. (eds.) CADE 2011. LNCS, vol. 6803, pp. 299–314. Springer, Heidelberg (2011)
Chapter Google Scholar
Kühlwein, D., Blanchette, J.C., Kaliszyk, C., Urban, J.: MaSh: Machine learning for sledgehammer. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 35–50. Springer, Heidelberg (2013)
Chapter Google Scholar
Kaliszyk, C., Urban, J.: MizAR 40 for Mizar 40. CoRR (2013)
Google Scholar
Kaliszyk, C., Urban, J.: Stronger automation for Flyspeck by feature weighting and strategy evolution. In: PxTP 2013. EPiC Series, vol. 14, pp. 87–95. EasyChair (2013)
Google Scholar
Kaliszyk, C., Urban, J.: Learning-assisted automated reasoning with Flyspeck. Journal of Automated Reasoning 53(2), 173–213 (2014)
Article MathSciNet MATH Google Scholar
Kaliszyk, C., Urban, J.: HOL(y)Hammer: Online ATP service for HOL Light. Mathematics in Computer Science 9(1), 5–22 (2015)
Article MATH Google Scholar
Kovács, L., Voronkov, A.: First-order theorem proving and Vampire. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 1–35. Springer, Heidelberg (2013)
Chapter Google Scholar
Kühlwein, D.A.: Machine Learning for Automated Reasoning. PhD thesis, Radboud Universiteit Nijmegen, April 2014
Google Scholar
Lakshminarayanan, B., Roy, D., Teh, Y.W.: c. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Meng, J., Paulson, L.C.: Lightweight relevance filtering for machine-generated resolution problems. In: ESCoR: Empirically Successful Computerized Reasoning, pp. 53–69 (2006)
Google Scholar
Naumowicz, A., Korniłowicz, A.: A brief overview of mizar. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 67–72. Springer, Heidelberg (2009)
Chapter Google Scholar
Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL. LNCS, vol. 2283. Springer, Heidelberg (2002)
MATH Google Scholar
Oza, N.C., Russell, S.J.: Online bagging and boosting. In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, AISTATS 2001, January 4-7, vol. Key West, Florida, US (2001)
Google Scholar
Raileanu, L.E., Stoffel, K.: Theoretical comparison between the Gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence 41(1), 77–93 (2004)
Article MathSciNet MATH Google Scholar
Schulz, S.: System description: E 1.8. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds.) LPAR-19 2013. LNCS, vol. 8312, pp. 735–743. Springer, Heidelberg (2013)
Chapter Google Scholar
Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H.: On-line random forests. In: 3rd IEEE ICCV Workshop on On-line Computer Vision (2009)
Google Scholar
Sorower, M.S.: A literature survey on algorithms for multi-label learning. Oregon State University, Corvallis (2010)
Google Scholar
Sutcliffe, G.: The TPTP problem library and associated infrastructure. Journal of Automated Reasoning 43(4), 337–362 (2009)
Article MathSciNet MATH Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. Int. J. Data Warehousing and Mining, 1–13 (2007)
Google Scholar
Urban, J.: MPTP - motivation, implementation, first experiments. J. Autom. Reasoning 33(3-4), 319–339 (2004)
Article MathSciNet MATH Google Scholar
Zhang, M.-L., Zhou, Z.-H.: A k-nearest neighbor based algorithm for multi-label classification. In: Proceedings of the 1st IEEE International Conference on Granular Computing (GrC 2005), Beijing, China, pp. 718–721 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Innsbruck, Innsbruck, Austria
Michael Färber & Cezary Kaliszyk

Authors

Michael Färber
View author publications
You can also search for this author in PubMed Google Scholar
Cezary Kaliszyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Färber .

Editor information

Editors and Affiliations

Universität Bremen, Bremen, Germany
Carsten Lutz
Fondazione Bruno Kessler, Trento, Italy
Silvio Ranise

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Färber, M., Kaliszyk, C. (2015). Random Forests for Premise Selection. In: Lutz, C., Ranise, S. (eds) Frontiers of Combining Systems. FroCoS 2015. Lecture Notes in Computer Science(), vol 9322. Springer, Cham. https://doi.org/10.1007/978-3-319-24246-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-24246-0_20
Published: 12 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24245-3
Online ISBN: 978-3-319-24246-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics