Skip to main content

Large-Scale Experiments for Mathematical Document Classification

  • Conference paper
Digital Libraries: Social Media and Community Networks (ICADL 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8279))

Included in the following conference series:

Abstract

The ever increasing amount of digitally available information is curse and blessing at the same time. On the one hand, users have increasingly large amounts of information at their fingertips. On the other hand, the assessment and refinement of web search results becomes more and more tiresome and difficult for non-experts in a domain. Therefore, established digital libraries offer specialized collections with a certain degree of quality. This quality can largely be attributed to the great effort invested into semantic enrichment of the provided documents e.g. by annotating their documents with respect to a domain-specific taxonomy. This process is still done manually in many domains, e.g. chemistry (CAS), medicine (MeSH), or mathematics (MSC). But due to the growing amount of data, this manual task gets more and more time consuming and expensive. The only solution for this problem seems to employ automated classification algorithms, but from evaluations done in previous research, conclusions to a real world scenario are difficult to make. We therefore conducted a large scale feasibility study on a real world data set from one of the biggest mathematical digital libraries, i.e. Zentralblatt MATH, with special focus on its practical applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chirita, P.A., Nejdl, W., Paiu, R., Kohlschütter, C.: Using ODP metadata to personalize search. In: SIGIR 2005, Salvador, Brazil (2005)

    Google Scholar 

  2. Mirizzi, R., Ragone, A., Di Noia, T., Di Sciascio, E.: Semantic wonder cloud: exploratory search in DBpedia. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 138–149. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Homoceanu, S., Dechand, S., Balke, W.-T.: Review Driven Customer Segmentation for Improved E-Shopping Experience. ACM Web Science (2011)

    Google Scholar 

  4. Shen, D., Ruvini, J.-D., Sarwar, B.: Large-scale item categorization for e-commerce. In: CIKM 2012, Maui, Hawaii, USA (2012)

    Google Scholar 

  5. Cheng, W., Kasneci, G., Graepel, T., Stern, D., Herbrich, R.: Automated feature generation from structured knowledge. In: CIKM 2011, Glasgow, Scotland, UK (2011)

    Google Scholar 

  6. Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: CIKM 1998, Bethesda, Maryland, USA (1998)

    Google Scholar 

  7. Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorization. ACM Trans. Inf. Syst., pp. 141–173 (April 1999)

    Google Scholar 

  8. Řehůřek, R., Sojka, P.: Automated Classification and Categorization of Mathematical Knowledge. In: CICM 2008, pp. 543–557 (2008)

    Google Scholar 

  9. Song, Y., Zhuang, Z., Li, H., Zhao, Q., Li, J., Lee, W.-C., Giles, C.L.: Real-time automatic tag recommendation. In: SIGIR 2008 (2008)

    Google Scholar 

  10. Sigurbjörnsson, B., van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: WWW 2008, Beijing, China (2008)

    Google Scholar 

  11. Byde, A., Wan, H., Cayzer, S.: Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics. In: ICWSM 2007 (2007)

    Google Scholar 

  12. Larson, R.R.: Experiments in automatic library of congress classification. In: JASIS 1992, pp. 130–148 (1992)

    Google Scholar 

  13. Zhang, B., Gonçalves, M.A., Fan, W., Chen, Y., Fox, E.A., Calado, P., Cristo, M.: Combining structural and citation-based evidence for text classification. In: ICKM 2004 (2004)

    Google Scholar 

  14. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys, 1–47 (2002)

    Google Scholar 

  15. Sun, A., Lim, E.-P.: Hierarchical text classification and evaluation. In: ICDM 2001 (2001)

    Google Scholar 

  16. Prodescu, C.C., Kohlhase, M.: Mathwebsearch 0.5-open formula search engine. In: Wissens-und Erfahrungsmanagement Conference Proceedings (2011)

    Google Scholar 

  17. Kohlhase, M., Matican, B.A., Prodescu, C.-C.: MathWebSearch 0.5: scaling an open formula search engine. In: CICM 2012, pp. 342–357 (2012)

    Google Scholar 

  18. Iancu, M., Kohlhase, M., Rabe, F., Urban, J.: The Mizar Mathematical Library in OMDoc: Translation and Applications. Journal of Automated Reasoning, 191–202 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Barthel, S., Tönnies, S., Balke, WT. (2013). Large-Scale Experiments for Mathematical Document Classification. In: Urs, S.R., Na, JC., Buchanan, G. (eds) Digital Libraries: Social Media and Community Networks. ICADL 2013. Lecture Notes in Computer Science, vol 8279. Springer, Cham. https://doi.org/10.1007/978-3-319-03599-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03599-4_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03598-7

  • Online ISBN: 978-3-319-03599-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics