Large-Scale Experiments for Mathematical Document Classification

Barthel, Simon; Tönnies, Sascha; Balke, Wolf-Tilo

doi:10.1007/978-3-319-03599-4_10

Simon Barthel¹⁹,
Sascha Tönnies²⁰ &
Wolf-Tilo Balke^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8279))

Included in the following conference series:

International Conference on Asian Digital Libraries

1678 Accesses
3 Citations

Abstract

The ever increasing amount of digitally available information is curse and blessing at the same time. On the one hand, users have increasingly large amounts of information at their fingertips. On the other hand, the assessment and refinement of web search results becomes more and more tiresome and difficult for non-experts in a domain. Therefore, established digital libraries offer specialized collections with a certain degree of quality. This quality can largely be attributed to the great effort invested into semantic enrichment of the provided documents e.g. by annotating their documents with respect to a domain-specific taxonomy. This process is still done manually in many domains, e.g. chemistry (CAS), medicine (MeSH), or mathematics (MSC). But due to the growing amount of data, this manual task gets more and more time consuming and expensive. The only solution for this problem seems to employ automated classification algorithms, but from evaluations done in previous research, conclusions to a real world scenario are difficult to make. We therefore conducted a large scale feasibility study on a real world data set from one of the biggest mathematical digital libraries, i.e. Zentralblatt MATH, with special focus on its practical applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chirita, P.A., Nejdl, W., Paiu, R., Kohlschütter, C.: Using ODP metadata to personalize search. In: SIGIR 2005, Salvador, Brazil (2005)
Google Scholar
Mirizzi, R., Ragone, A., Di Noia, T., Di Sciascio, E.: Semantic wonder cloud: exploratory search in DBpedia. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 138–149. Springer, Heidelberg (2010)
Chapter Google Scholar
Homoceanu, S., Dechand, S., Balke, W.-T.: Review Driven Customer Segmentation for Improved E-Shopping Experience. ACM Web Science (2011)
Google Scholar
Shen, D., Ruvini, J.-D., Sarwar, B.: Large-scale item categorization for e-commerce. In: CIKM 2012, Maui, Hawaii, USA (2012)
Google Scholar
Cheng, W., Kasneci, G., Graepel, T., Stern, D., Herbrich, R.: Automated feature generation from structured knowledge. In: CIKM 2011, Glasgow, Scotland, UK (2011)
Google Scholar
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: CIKM 1998, Bethesda, Maryland, USA (1998)
Google Scholar
Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorization. ACM Trans. Inf. Syst., pp. 141–173 (April 1999)
Google Scholar
Řehůřek, R., Sojka, P.: Automated Classification and Categorization of Mathematical Knowledge. In: CICM 2008, pp. 543–557 (2008)
Google Scholar
Song, Y., Zhuang, Z., Li, H., Zhao, Q., Li, J., Lee, W.-C., Giles, C.L.: Real-time automatic tag recommendation. In: SIGIR 2008 (2008)
Google Scholar
Sigurbjörnsson, B., van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: WWW 2008, Beijing, China (2008)
Google Scholar
Byde, A., Wan, H., Cayzer, S.: Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics. In: ICWSM 2007 (2007)
Google Scholar
Larson, R.R.: Experiments in automatic library of congress classification. In: JASIS 1992, pp. 130–148 (1992)
Google Scholar
Zhang, B., Gonçalves, M.A., Fan, W., Chen, Y., Fox, E.A., Calado, P., Cristo, M.: Combining structural and citation-based evidence for text classification. In: ICKM 2004 (2004)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys, 1–47 (2002)
Google Scholar
Sun, A., Lim, E.-P.: Hierarchical text classification and evaluation. In: ICDM 2001 (2001)
Google Scholar
Prodescu, C.C., Kohlhase, M.: Mathwebsearch 0.5-open formula search engine. In: Wissens-und Erfahrungsmanagement Conference Proceedings (2011)
Google Scholar
Kohlhase, M., Matican, B.A., Prodescu, C.-C.: MathWebSearch 0.5: scaling an open formula search engine. In: CICM 2012, pp. 342–357 (2012)
Google Scholar
Iancu, M., Kohlhase, M., Rabe, F., Urban, J.: The Mizar Mathematical Library in OMDoc: Translation and Applications. Journal of Automated Reasoning, 191–202 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

IFIS TU Braunschweig, Mühlenpfordstraße 23, 38106, Braunschweig, Germany
Simon Barthel & Wolf-Tilo Balke
L3S Research Center, Appelstraße 9a, 30167, Hannover, Germany
Sascha Tönnies & Wolf-Tilo Balke

Authors

Simon Barthel
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Tönnies
View author publications
You can also search for this author in PubMed Google Scholar
Wolf-Tilo Balke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

International School of Information Management, University of Mysore, Mysore, India
Shalini R. Urs
Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore
Jin-Cheon Na
School of Informatics, City University London, London, UK
George Buchanan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barthel, S., Tönnies, S., Balke, WT. (2013). Large-Scale Experiments for Mathematical Document Classification. In: Urs, S.R., Na, JC., Buchanan, G. (eds) Digital Libraries: Social Media and Community Networks. ICADL 2013. Lecture Notes in Computer Science, vol 8279. Springer, Cham. https://doi.org/10.1007/978-3-319-03599-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-03599-4_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03598-7
Online ISBN: 978-3-319-03599-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics