Hypatia Digital Library: A Text Classification Approach Based on Abstracts

Vorgia, Frosso; Triantafyllou, Ioannis; Koulouris, Alexandros

doi:10.1007/978-3-319-33865-1_89

Frosso Vorgia⁴,
Ioannis Triantafyllou⁴ &
Alexandros Koulouris⁴

Part of the book series: Springer Proceedings in Business and Economics ((SPBE))

2 Citations

Abstract

The purpose of this paper is to investigate the application of text classification in Hypatia, the digital library of Technological Educational Institute of Athens, in order to provide an automated classification tool as an alternative to manual assignments. The crucial point in text classification is the selection of the most important term-words for document representation. Classic weighting method TF.IDF was investigated. Our document collection consists of 718 abstracts in Medicine, Tourism and Food Technology. Classification was conducted utilizing 14 classifiers available on WEKA. Classification process yielded an excellent ~97 % precision score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Awad, W. A., & ELseuofi, S. M. (2011). Machine learning methods for spam e-mail classification. International Journal of Computer Science & Information Technology, 3(1), 173–184.
Article Google Scholar
Bouckaert, R. R., Frank, E., Hall, M. A., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2010). WEKA-experiences with a Java open-source project. Journal of Machine Learning Research, 11, 2533–2541.
Google Scholar
Croft, W. B., Metzler, D., & Strohman, T. (2010). Search engines: Information retrieval in practice. Addison-Wesley.
Google Scholar
Irani, D., Webb, S., Pu, C., & Li, K. (2010). Study of trend-stuffing on twitter through text classification. Proceedings of Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS).
Google Scholar
Jones, K. S. (1972). A statistical interpretation of term frequency and its application in retrieval. Journal of Documentation, 28(1), 11–21.
Article Google Scholar
Joorabchi, A., & Mahdi, A. (2011). An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata. Journal of In-formation Science, 37(5), 499–514.
Article Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1137–1145).
Google Scholar
Machine Learning Group at the University of Waikato. (n.d.) WEKA 3-data min-ing with open source machine learning software in Java. Retrieved June 06, 2015 from http://www.cs.waikato.ac.nz/~ml/weka/index.html.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.
Article Google Scholar
Triantafyllou, I., Demiros, I., & Piperidis, S. (2001). Two level self-organizing approach to text classification. Proceedings of RANLP-2001: Recent Advances in NLP.
Google Scholar
Triantafyllou, I., Koulouris, A., Zervos, S., Dendrinos, M., Kyriaki-Manessi, D., & Giannakopoulos, G. (2014). Significance of clustering and classification applications in digital and physical libraries. Proceedings of 4th International Conference IC-ININFO 2014, Madrid, Spain.
Google Scholar
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques. Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Library Science and Information Systems, Technological Educational Institute of Athens, Aegaleo, Athens, Greece
Frosso Vorgia, Ioannis Triantafyllou & Alexandros Koulouris

Authors

Frosso Vorgia
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Triantafyllou
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros Koulouris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frosso Vorgia .

Editor information

Editors and Affiliations

Technological Educational Institute of Athens , Egaleo, Greece
Androniki Kavoura
Dept. of Computer Science & Technology, University of Peloponnese, Tripoli, Greece
Damianos P. Sakas
Technological Educational Institute of Athens , Egaleo, Greece
Petros Tomaras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vorgia, F., Triantafyllou, I., Koulouris, A. (2017). Hypatia Digital Library: A Text Classification Approach Based on Abstracts. In: Kavoura, A., Sakas, D., Tomaras, P. (eds) Strategic Innovative Marketing. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-319-33865-1_89

Download citation

DOI: https://doi.org/10.1007/978-3-319-33865-1_89
Published: 27 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33863-7
Online ISBN: 978-3-319-33865-1
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics