Automated Classification and Categorization of Mathematical Knowledge

Řehůřek, Radim; Sojka, Petr

doi:10.1007/978-3-540-85110-3_44

Radim Řehůřek¹ &
Petr Sojka¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5144))

Included in the following conference series:

International Conference on Intelligent Computer Mathematics

760 Accesses
10 Citations

Abstract

There is a commonMathematics SubjectClassification(MSC) System used for categorizing mathematical papers and knowledge. We present results of machine learning of the MSC on full texts of papers in the mathematical digital libraries DML-CZ and NUMDAM. The F1- measure achieved on classification task of top-level MSC categories exceeds 89%. We describe and evaluate our methods for measuring the similarity of papers in the digital library based on paper full texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Royal Society of London: Catalogue of scientific papers 1800–1900 vol. 1–19 and Subject Index in 4 vols (published, 1867–1925) (1908), free electronic version available by project Gallica http://gallica.bnf.fr/
Ohrtmann, C., Müller, F., (eds.): Jahrbuch über die Fortschritte der Mathematik vol. 1–68 (1868–1942) Druck und Verlag von Georg Reimer, Berlin (1871–1942); electronic version available by project ERAM, http://www.emis.de/projects/JFM/
Bouche, T.: Towards a Digital Mathematics Library? In: Rocha, E.M. (ed.) CMDE 2006: Communicating Mathematics in the Digital Era, pp. 43–68. A.K. Peters, MA, USA (2008)
Google Scholar
Sojka, P.: From Scanned Image to Knowledge Sharing. In: Tochtermann, K., Maurer, H. (eds.) Proceedings of I-KNOW 2005: Fifth International Conference on Knowledge Management, Graz, Austria, Know-Center in coop, Graz Uni, pp. 664–672. Joanneum Research and Springer Pub. Co (2005)
Google Scholar
Bartošek, M., Lhoták, M., Rákosník, J., Sojka, P., Šárfy, M.: DML-CZ: The Objectives and the First Steps. In: Borwein, J., Rocha, E.M., Rodrigues, J.F. (eds.) CMDE 2006: Communicating Mathematics in the Digital Era, pp. 69–79. A.K. Peters, MA, USA (2008)
Google Scholar
Dunning, T.: Statistical identification of language. Technical Report MCCS 94-273, New Mexico State University, Computing Research Lab (1994)
Google Scholar
Sojka, P., Panák, R., Mudrák, T.: Optical Character Recognition of Mathematical Texts in the DML-CZ Project. Technical report, Masaryk University, Brno. CMDE 2006 conference in Aveiro, Portugal (presented, 2006)
Google Scholar
Pomikálek, J., Řehůřek, R.: The Influence of Preprocessing Parameters on Text Categorization. International Journal of Applied Science, Engineering and Technology 1, 430–434 (2007)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Article MathSciNet Google Scholar
Yang, Y., Joachims, T.: Text categorization. Scholarpedia (2008), http://www.scholarpedia.org/article/Text_categorization
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Google Scholar
Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Linguistic Analysis, pp. 191–202 (1993)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of ICML 1997, 14th International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000)
Chapter Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
Article MATH Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)
Article Google Scholar
Lee, J.H.: Analyses of multiple evidence combination. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Combination Techniques, pp. 267–276 (1997)
Google Scholar
Yang, Y.: A Study on Thresholding Strategies for Text Categorization. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), pp. 137–145. ACM Press, New York (2001)
Chapter Google Scholar
Gandrabur, S., Foster, G., Lapalme, G.: Confidence Estimation for NLP Applications. ACM Transactions on Speech and Language Processing 3, 1–29 (2006)
Article Google Scholar
Esuli, A., Fagni, T., Sebastiani, F.: Boosting multi-label hierarchical text categorization. Information Retrieval 11 (2008)
Google Scholar
Allen, J.A.: The international catalogue of scientific literature. The Auk. 21, 494–501 (1904)
Google Scholar
Rusin, D.: The Mathematical Atlas—A Gateway to Modern Mathematics (2002), http://www.math-atlas.org/welcome.html
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)
Article Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 153–160. MIT Press, Cambridge (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Radim Řehůřek & Petr Sojka

Authors

Radim Řehůřek
View author publications
You can also search for this author in PubMed Google Scholar
Petr Sojka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Serge Autexier John Campbell Julio Rubio Volker Sorge Masakazu Suzuki Freek Wiedijk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Řehůřek, R., Sojka, P. (2008). Automated Classification and Categorization of Mathematical Knowledge. In: Autexier, S., Campbell, J., Rubio, J., Sorge, V., Suzuki, M., Wiedijk, F. (eds) Intelligent Computer Mathematics. CICM 2008. Lecture Notes in Computer Science(), vol 5144. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85110-3_44

Download citation

DOI: https://doi.org/10.1007/978-3-540-85110-3_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85109-7
Online ISBN: 978-3-540-85110-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automated Classification and Categorization of Mathematical Knowledge