Skip to main content

Automated Classification and Categorization of Mathematical Knowledge

  • Conference paper
Book cover Intelligent Computer Mathematics (CICM 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5144))

Included in the following conference series:

Abstract

There is a commonMathematics SubjectClassification(MSC) System used for categorizing mathematical papers and knowledge. We present results of machine learning of the MSC on full texts of papers in the mathematical digital libraries DML-CZ and NUMDAM. The F1- measure achieved on classification task of top-level MSC categories exceeds 89%. We describe and evaluate our methods for measuring the similarity of papers in the digital library based on paper full texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Royal Society of London: Catalogue of scientific papers 1800–1900 vol. 1–19 and Subject Index in 4 vols (published, 1867–1925) (1908), free electronic version available by project Gallica http://gallica.bnf.fr/

  2. Ohrtmann, C., Müller, F., (eds.): Jahrbuch über die Fortschritte der Mathematik vol. 1–68 (1868–1942) Druck und Verlag von Georg Reimer, Berlin (1871–1942); electronic version available by project ERAM, http://www.emis.de/projects/JFM/

  3. Bouche, T.: Towards a Digital Mathematics Library? In: Rocha, E.M. (ed.) CMDE 2006: Communicating Mathematics in the Digital Era, pp. 43–68. A.K. Peters, MA, USA (2008)

    Google Scholar 

  4. Sojka, P.: From Scanned Image to Knowledge Sharing. In: Tochtermann, K., Maurer, H. (eds.) Proceedings of I-KNOW 2005: Fifth International Conference on Knowledge Management, Graz, Austria, Know-Center in coop, Graz Uni, pp. 664–672. Joanneum Research and Springer Pub. Co (2005)

    Google Scholar 

  5. Bartošek, M., Lhoták, M., Rákosník, J., Sojka, P., Šárfy, M.: DML-CZ: The Objectives and the First Steps. In: Borwein, J., Rocha, E.M., Rodrigues, J.F. (eds.) CMDE 2006: Communicating Mathematics in the Digital Era, pp. 69–79. A.K. Peters, MA, USA (2008)

    Google Scholar 

  6. Dunning, T.: Statistical identification of language. Technical Report MCCS 94-273, New Mexico State University, Computing Research Lab (1994)

    Google Scholar 

  7. Sojka, P., Panák, R., Mudrák, T.: Optical Character Recognition of Mathematical Texts in the DML-CZ Project. Technical report, Masaryk University, Brno. CMDE 2006 conference in Aveiro, Portugal (presented, 2006)

    Google Scholar 

  8. Pomikálek, J., Řehůřek, R.: The Influence of Preprocessing Parameters on Text Categorization. International Journal of Applied Science, Engineering and Technology 1, 430–434 (2007)

    Google Scholar 

  9. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  10. Yang, Y., Joachims, T.: Text categorization. Scholarpedia (2008), http://www.scholarpedia.org/article/Text_categorization

  11. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Google Scholar 

  12. Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Linguistic Analysis, pp. 191–202 (1993)

    Google Scholar 

  13. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of ICML 1997, 14th International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  14. Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  15. Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)

    Article  MATH  Google Scholar 

  16. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)

    Article  Google Scholar 

  17. Lee, J.H.: Analyses of multiple evidence combination. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Combination Techniques, pp. 267–276 (1997)

    Google Scholar 

  18. Yang, Y.: A Study on Thresholding Strategies for Text Categorization. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), pp. 137–145. ACM Press, New York (2001)

    Chapter  Google Scholar 

  19. Gandrabur, S., Foster, G., Lapalme, G.: Confidence Estimation for NLP Applications. ACM Transactions on Speech and Language Processing 3, 1–29 (2006)

    Article  Google Scholar 

  20. Esuli, A., Fagni, T., Sebastiani, F.: Boosting multi-label hierarchical text categorization. Information Retrieval 11 (2008)

    Google Scholar 

  21. Allen, J.A.: The international catalogue of scientific literature. The Auk. 21, 494–501 (1904)

    Google Scholar 

  22. Rusin, D.: The Mathematical Atlas—A Gateway to Modern Mathematics (2002), http://www.math-atlas.org/welcome.html

  23. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  24. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 153–160. MIT Press, Cambridge (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Serge Autexier John Campbell Julio Rubio Volker Sorge Masakazu Suzuki Freek Wiedijk

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Řehůřek, R., Sojka, P. (2008). Automated Classification and Categorization of Mathematical Knowledge. In: Autexier, S., Campbell, J., Rubio, J., Sorge, V., Suzuki, M., Wiedijk, F. (eds) Intelligent Computer Mathematics. CICM 2008. Lecture Notes in Computer Science(), vol 5144. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85110-3_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85110-3_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85109-7

  • Online ISBN: 978-3-540-85110-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics