Skip to main content

Hypatia Digital Library: A Text Classification Approach Based on Abstracts

  • Conference paper
  • First Online:
Strategic Innovative Marketing

Abstract

The purpose of this paper is to investigate the application of text classification in Hypatia, the digital library of Technological Educational Institute of Athens, in order to provide an automated classification tool as an alternative to manual assignments. The crucial point in text classification is the selection of the most important term-words for document representation. Classic weighting method TF.IDF was investigated. Our document collection consists of 718 abstracts in Medicine, Tourism and Food Technology. Classification was conducted utilizing 14 classifiers available on WEKA. Classification process yielded an excellent ~97 % precision score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Awad, W. A., & ELseuofi, S. M. (2011). Machine learning methods for spam e-mail classification. International Journal of Computer Science & Information Technology, 3(1), 173–184.

    Article  Google Scholar 

  • Bouckaert, R. R., Frank, E., Hall, M. A., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2010). WEKA-experiences with a Java open-source project. Journal of Machine Learning Research, 11, 2533–2541.

    Google Scholar 

  • Croft, W. B., Metzler, D., & Strohman, T. (2010). Search engines: Information retrieval in practice. Addison-Wesley.

    Google Scholar 

  • Irani, D., Webb, S., Pu, C., & Li, K. (2010). Study of trend-stuffing on twitter through text classification. Proceedings of Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS).

    Google Scholar 

  • Jones, K. S. (1972). A statistical interpretation of term frequency and its application in retrieval. Journal of Documentation, 28(1), 11–21.

    Article  Google Scholar 

  • Joorabchi, A., & Mahdi, A. (2011). An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata. Journal of In-formation Science, 37(5), 499–514.

    Article  Google Scholar 

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1137–1145).

    Google Scholar 

  • Machine Learning Group at the University of Waikato. (n.d.) WEKA 3-data min-ing with open source machine learning software in Java. Retrieved June 06, 2015 from http://www.cs.waikato.ac.nz/~ml/weka/index.html.

  • Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.

    Article  Google Scholar 

  • Triantafyllou, I., Demiros, I., & Piperidis, S. (2001). Two level self-organizing approach to text classification. Proceedings of RANLP-2001: Recent Advances in NLP.

    Google Scholar 

  • Triantafyllou, I., Koulouris, A., Zervos, S., Dendrinos, M., Kyriaki-Manessi, D., & Giannakopoulos, G. (2014). Significance of clustering and classification applications in digital and physical libraries. Proceedings of 4th International Conference IC-ININFO 2014, Madrid, Spain.

    Google Scholar 

  • Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques. Morgan Kaufmann.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frosso Vorgia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this paper

Cite this paper

Vorgia, F., Triantafyllou, I., Koulouris, A. (2017). Hypatia Digital Library: A Text Classification Approach Based on Abstracts. In: Kavoura, A., Sakas, D., Tomaras, P. (eds) Strategic Innovative Marketing. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-319-33865-1_89

Download citation

Publish with us

Policies and ethics