Abstract
The ever increasing amount of digitally available information is curse and blessing at the same time. On the one hand, users have increasingly large amounts of information at their fingertips. On the other hand, the assessment and refinement of web search results becomes more and more tiresome and difficult for non-experts in a domain. Therefore, established digital libraries offer specialized collections with a certain degree of quality. This quality can largely be attributed to the great effort invested into semantic enrichment of the provided documents e.g. by annotating their documents with respect to a domain-specific taxonomy. This process is still done manually in many domains, e.g. chemistry (CAS), medicine (MeSH), or mathematics (MSC). But due to the growing amount of data, this manual task gets more and more time consuming and expensive. The only solution for this problem seems to employ automated classification algorithms, but from evaluations done in previous research, conclusions to a real world scenario are difficult to make. We therefore conducted a large scale feasibility study on a real world data set from one of the biggest mathematical digital libraries, i.e. Zentralblatt MATH, with special focus on its practical applicability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chirita, P.A., Nejdl, W., Paiu, R., Kohlschütter, C.: Using ODP metadata to personalize search. In: SIGIR 2005, Salvador, Brazil (2005)
Mirizzi, R., Ragone, A., Di Noia, T., Di Sciascio, E.: Semantic wonder cloud: exploratory search in DBpedia. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 138–149. Springer, Heidelberg (2010)
Homoceanu, S., Dechand, S., Balke, W.-T.: Review Driven Customer Segmentation for Improved E-Shopping Experience. ACM Web Science (2011)
Shen, D., Ruvini, J.-D., Sarwar, B.: Large-scale item categorization for e-commerce. In: CIKM 2012, Maui, Hawaii, USA (2012)
Cheng, W., Kasneci, G., Graepel, T., Stern, D., Herbrich, R.: Automated feature generation from structured knowledge. In: CIKM 2011, Glasgow, Scotland, UK (2011)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: CIKM 1998, Bethesda, Maryland, USA (1998)
Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorization. ACM Trans. Inf. Syst., pp. 141–173 (April 1999)
Řehůřek, R., Sojka, P.: Automated Classification and Categorization of Mathematical Knowledge. In: CICM 2008, pp. 543–557 (2008)
Song, Y., Zhuang, Z., Li, H., Zhao, Q., Li, J., Lee, W.-C., Giles, C.L.: Real-time automatic tag recommendation. In: SIGIR 2008 (2008)
Sigurbjörnsson, B., van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: WWW 2008, Beijing, China (2008)
Byde, A., Wan, H., Cayzer, S.: Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics. In: ICWSM 2007 (2007)
Larson, R.R.: Experiments in automatic library of congress classification. In: JASIS 1992, pp. 130–148 (1992)
Zhang, B., Gonçalves, M.A., Fan, W., Chen, Y., Fox, E.A., Calado, P., Cristo, M.: Combining structural and citation-based evidence for text classification. In: ICKM 2004 (2004)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys, 1–47 (2002)
Sun, A., Lim, E.-P.: Hierarchical text classification and evaluation. In: ICDM 2001 (2001)
Prodescu, C.C., Kohlhase, M.: Mathwebsearch 0.5-open formula search engine. In: Wissens-und Erfahrungsmanagement Conference Proceedings (2011)
Kohlhase, M., Matican, B.A., Prodescu, C.-C.: MathWebSearch 0.5: scaling an open formula search engine. In: CICM 2012, pp. 342–357 (2012)
Iancu, M., Kohlhase, M., Rabe, F., Urban, J.: The Mizar Mathematical Library in OMDoc: Translation and Applications. Journal of Automated Reasoning, 191–202 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Barthel, S., Tönnies, S., Balke, WT. (2013). Large-Scale Experiments for Mathematical Document Classification. In: Urs, S.R., Na, JC., Buchanan, G. (eds) Digital Libraries: Social Media and Community Networks. ICADL 2013. Lecture Notes in Computer Science, vol 8279. Springer, Cham. https://doi.org/10.1007/978-3-319-03599-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-03599-4_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03598-7
Online ISBN: 978-3-319-03599-4
eBook Packages: Computer ScienceComputer Science (R0)