Skip to main content

Random Forests for Premise Selection

  • Conference paper
  • First Online:
Frontiers of Combining Systems (FroCoS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9322))

Included in the following conference series:

Abstract

The success rates of automated theorem provers in large theories highly depend on the choice of given facts. Premise selection is the task of choosing a subset of given facts, which is most likely to lead to a successful automated deduction proof of a given conjecture. Premise selection can be viewed as a multi-label classification problem, where machine learning from related proofs turns out to currently be the most successful method. Random forests are a machine learning technique known to perform especially well on large datasets. In this paper, we evaluate random forest algorithms for premise selection. To deal with the specifics of automated reasoning, we propose a number of extensions to random forests, such as incremental learning, multi-path querying, depth weighting, feature IDF (inverse document frequency), and integration of secondary classifiers in the tree leaves. With these extensions, we improve on the k-nearest neighbour algorithm both in terms of prediction quality and ATP performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Gupta, A., Prabhu, Y., Varma, M.: Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages. In: Proceedings of the 22nd International Conference on World Wide Web, WWW 2013, pp. 13–24 (2013)

    Google Scholar 

  2. Alama, J., Heskes, T., Kühlwein, D., Tsivtsivadze, E., Urban, J.: Premise selection for mathematics by corpus analysis and kernel methods. Journal of Automated Reasoning 52(2), 191–213 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  3. Blanchette, J.C., Böhme, S., Paulson, L.C.: Extending sledgehammer with SMT solvers. In: Bjørner, N., Sofronie-Stokkermans, V. (eds.) CADE 2011. LNCS, vol. 6803, pp. 116–130. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  Google Scholar 

  5. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. Carlson, A.J., Cumby, C.M., Rosen, J.L., Roth, D.: SNoW user guide (1999)

    Google Scholar 

  7. Caruana, R., Niculescu-mizil, A.: An empirical comparison of supervised learning algorithms. In: 23rd Intl. Conf. Machine Learning (ICML 2006), pp. 161–168 (2006)

    Google Scholar 

  8. Fawcett, T.: ROC graphs: Notes and practical considerations for researchers. Technical report, HP Laboratories, March 2004

    Google Scholar 

  9. Harrison, J.: HOL Light: A tutorial introduction. In: Srivas, M., Camilleri, A. (eds.) FMCAD 1996. LNCS, vol. 1166, pp. 265–269. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  10. Hoder, K., Voronkov, A.: Sine qua non for large theory reasoning. In: Bjørner, N., Sofronie-Stokkermans, V. (eds.) CADE 2011. LNCS, vol. 6803, pp. 299–314. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Kühlwein, D., Blanchette, J.C., Kaliszyk, C., Urban, J.: MaSh: Machine learning for sledgehammer. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 35–50. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  12. Kaliszyk, C., Urban, J.: MizAR 40 for Mizar 40. CoRR (2013)

    Google Scholar 

  13. Kaliszyk, C., Urban, J.: Stronger automation for Flyspeck by feature weighting and strategy evolution. In: PxTP 2013. EPiC Series, vol. 14, pp. 87–95. EasyChair (2013)

    Google Scholar 

  14. Kaliszyk, C., Urban, J.: Learning-assisted automated reasoning with Flyspeck. Journal of Automated Reasoning 53(2), 173–213 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kaliszyk, C., Urban, J.: HOL(y)Hammer: Online ATP service for HOL Light. Mathematics in Computer Science 9(1), 5–22 (2015)

    Article  MATH  Google Scholar 

  16. Kovács, L., Voronkov, A.: First-order theorem proving and Vampire. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 1–35. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  17. KĂĽhlwein, D.A.: Machine Learning for Automated Reasoning. PhD thesis, Radboud Universiteit Nijmegen, April 2014

    Google Scholar 

  18. Lakshminarayanan, B., Roy, D., Teh, Y.W.: c. In: Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  19. Meng, J., Paulson, L.C.: Lightweight relevance filtering for machine-generated resolution problems. In: ESCoR: Empirically Successful Computerized Reasoning, pp. 53–69 (2006)

    Google Scholar 

  20. Naumowicz, A., Korniłowicz, A.: A brief overview of mizar. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 67–72. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  21. Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL. LNCS, vol. 2283. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  22. Oza, N.C., Russell, S.J.: Online bagging and boosting. In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, AISTATS 2001, January 4-7, vol. Key West, Florida, US (2001)

    Google Scholar 

  23. Raileanu, L.E., Stoffel, K.: Theoretical comparison between the Gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence 41(1), 77–93 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  24. Schulz, S.: System description: E 1.8. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds.) LPAR-19 2013. LNCS, vol. 8312, pp. 735–743. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  25. Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H.: On-line random forests. In: 3rd IEEE ICCV Workshop on On-line Computer Vision (2009)

    Google Scholar 

  26. Sorower, M.S.: A literature survey on algorithms for multi-label learning. Oregon State University, Corvallis (2010)

    Google Scholar 

  27. Sutcliffe, G.: The TPTP problem library and associated infrastructure. Journal of Automated Reasoning 43(4), 337–362 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  28. Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. Int. J. Data Warehousing and Mining, 1–13 (2007)

    Google Scholar 

  29. Urban, J.: MPTP - motivation, implementation, first experiments. J. Autom. Reasoning 33(3-4), 319–339 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  30. Zhang, M.-L., Zhou, Z.-H.: A k-nearest neighbor based algorithm for multi-label classification. In: Proceedings of the 1st IEEE International Conference on Granular Computing (GrC 2005), Beijing, China, pp. 718–721 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Färber .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Färber, M., Kaliszyk, C. (2015). Random Forests for Premise Selection. In: Lutz, C., Ranise, S. (eds) Frontiers of Combining Systems. FroCoS 2015. Lecture Notes in Computer Science(), vol 9322. Springer, Cham. https://doi.org/10.1007/978-3-319-24246-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24246-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24245-3

  • Online ISBN: 978-3-319-24246-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics