Large-Scale Experiments for Mathematical Document Classification

  • Simon Barthel
  • Sascha Tönnies
  • Wolf-Tilo Balke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8279)


The ever increasing amount of digitally available information is curse and blessing at the same time. On the one hand, users have increasingly large amounts of information at their fingertips. On the other hand, the assessment and refinement of web search results becomes more and more tiresome and difficult for non-experts in a domain. Therefore, established digital libraries offer specialized collections with a certain degree of quality. This quality can largely be attributed to the great effort invested into semantic enrichment of the provided documents e.g. by annotating their documents with respect to a domain-specific taxonomy. This process is still done manually in many domains, e.g. chemistry (CAS), medicine (MeSH), or mathematics (MSC). But due to the growing amount of data, this manual task gets more and more time consuming and expensive. The only solution for this problem seems to employ automated classification algorithms, but from evaluations done in previous research, conclusions to a real world scenario are difficult to make. We therefore conducted a large scale feasibility study on a real world data set from one of the biggest mathematical digital libraries, i.e. Zentralblatt MATH, with special focus on its practical applicability.


Text Classification Mathematical Documents Experiments 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chirita, P.A., Nejdl, W., Paiu, R., Kohlschütter, C.: Using ODP metadata to personalize search. In: SIGIR 2005, Salvador, Brazil (2005)Google Scholar
  2. 2.
    Mirizzi, R., Ragone, A., Di Noia, T., Di Sciascio, E.: Semantic wonder cloud: exploratory search in DBpedia. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 138–149. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Homoceanu, S., Dechand, S., Balke, W.-T.: Review Driven Customer Segmentation for Improved E-Shopping Experience. ACM Web Science (2011)Google Scholar
  4. 4.
    Shen, D., Ruvini, J.-D., Sarwar, B.: Large-scale item categorization for e-commerce. In: CIKM 2012, Maui, Hawaii, USA (2012)Google Scholar
  5. 5.
    Cheng, W., Kasneci, G., Graepel, T., Stern, D., Herbrich, R.: Automated feature generation from structured knowledge. In: CIKM 2011, Glasgow, Scotland, UK (2011)Google Scholar
  6. 6.
    Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: CIKM 1998, Bethesda, Maryland, USA (1998)Google Scholar
  7. 7.
    Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorization. ACM Trans. Inf. Syst., pp. 141–173 (April 1999)Google Scholar
  8. 8.
    Řehůřek, R., Sojka, P.: Automated Classification and Categorization of Mathematical Knowledge. In: CICM 2008, pp. 543–557 (2008)Google Scholar
  9. 9.
    Song, Y., Zhuang, Z., Li, H., Zhao, Q., Li, J., Lee, W.-C., Giles, C.L.: Real-time automatic tag recommendation. In: SIGIR 2008 (2008)Google Scholar
  10. 10.
    Sigurbjörnsson, B., van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: WWW 2008, Beijing, China (2008)Google Scholar
  11. 11.
    Byde, A., Wan, H., Cayzer, S.: Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics. In: ICWSM 2007 (2007)Google Scholar
  12. 12.
    Larson, R.R.: Experiments in automatic library of congress classification. In: JASIS 1992, pp. 130–148 (1992)Google Scholar
  13. 13.
    Zhang, B., Gonçalves, M.A., Fan, W., Chen, Y., Fox, E.A., Calado, P., Cristo, M.: Combining structural and citation-based evidence for text classification. In: ICKM 2004 (2004)Google Scholar
  14. 14.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys, 1–47 (2002)Google Scholar
  15. 15.
    Sun, A., Lim, E.-P.: Hierarchical text classification and evaluation. In: ICDM 2001 (2001)Google Scholar
  16. 16.
    Prodescu, C.C., Kohlhase, M.: Mathwebsearch 0.5-open formula search engine. In: Wissens-und Erfahrungsmanagement Conference Proceedings (2011)Google Scholar
  17. 17.
    Kohlhase, M., Matican, B.A., Prodescu, C.-C.: MathWebSearch 0.5: scaling an open formula search engine. In: CICM 2012, pp. 342–357 (2012)Google Scholar
  18. 18.
    Iancu, M., Kohlhase, M., Rabe, F., Urban, J.: The Mizar Mathematical Library in OMDoc: Translation and Applications. Journal of Automated Reasoning, 191–202 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Simon Barthel
    • 1
  • Sascha Tönnies
    • 2
  • Wolf-Tilo Balke
    • 1
    • 2
  1. 1.IFIS TU BraunschweigBraunschweigGermany
  2. 2.L3S Research CenterHannoverGermany

Personalised recommendations