Abstract
The purpose of this paper is to investigate the application of text classification in Hypatia, the digital library of Technological Educational Institute of Athens, in order to provide an automated classification tool as an alternative to manual assignments. The crucial point in text classification is the selection of the most important term-words for document representation. Classic weighting method TF.IDF was investigated. Our document collection consists of 718 abstracts in Medicine, Tourism and Food Technology. Classification was conducted utilizing 14 classifiers available on WEKA. Classification process yielded an excellent ~97 % precision score.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Awad, W. A., & ELseuofi, S. M. (2011). Machine learning methods for spam e-mail classification. International Journal of Computer Science & Information Technology, 3(1), 173–184.
Bouckaert, R. R., Frank, E., Hall, M. A., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2010). WEKA-experiences with a Java open-source project. Journal of Machine Learning Research, 11, 2533–2541.
Croft, W. B., Metzler, D., & Strohman, T. (2010). Search engines: Information retrieval in practice. Addison-Wesley.
Irani, D., Webb, S., Pu, C., & Li, K. (2010). Study of trend-stuffing on twitter through text classification. Proceedings of Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS).
Jones, K. S. (1972). A statistical interpretation of term frequency and its application in retrieval. Journal of Documentation, 28(1), 11–21.
Joorabchi, A., & Mahdi, A. (2011). An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata. Journal of In-formation Science, 37(5), 499–514.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1137–1145).
Machine Learning Group at the University of Waikato. (n.d.) WEKA 3-data min-ing with open source machine learning software in Java. Retrieved June 06, 2015 from http://www.cs.waikato.ac.nz/~ml/weka/index.html.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.
Triantafyllou, I., Demiros, I., & Piperidis, S. (2001). Two level self-organizing approach to text classification. Proceedings of RANLP-2001: Recent Advances in NLP.
Triantafyllou, I., Koulouris, A., Zervos, S., Dendrinos, M., Kyriaki-Manessi, D., & Giannakopoulos, G. (2014). Significance of clustering and classification applications in digital and physical libraries. Proceedings of 4th International Conference IC-ININFO 2014, Madrid, Spain.
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques. Morgan Kaufmann.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this paper
Cite this paper
Vorgia, F., Triantafyllou, I., Koulouris, A. (2017). Hypatia Digital Library: A Text Classification Approach Based on Abstracts. In: Kavoura, A., Sakas, D., Tomaras, P. (eds) Strategic Innovative Marketing. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-319-33865-1_89
Download citation
DOI: https://doi.org/10.1007/978-3-319-33865-1_89
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33863-7
Online ISBN: 978-3-319-33865-1
eBook Packages: Business and ManagementBusiness and Management (R0)