Abstract
In today’s world, a huge amount of data is stored in the form of electronic documents in the World Wide Web. Text classification algorithms have been widely used for classifying those text documents into a fixed number of predefined classes. The applicable scopes and their performances of these algorithms are different. Therefore, finding an appropriate algorithm for a dataset is becoming a significant emphasis for researchers to solve practical problems quickly. This paper puts forward an experimental evaluation of five significant text classification algorithms with each other and with TF and TF-IDF feature selection methods built using decision tree (C5.0), support vector machine, K-nearest neighbor, Naïve Bayes, and neural network on four public datasets, namely 20news-bydate, ohsumed-first-20000-docs, Reuters 21578-Apte-90 Cat, and 20 Newsgroup. The experimental results are examined from multiple perspectives and summarized to provide usefulness of different algorithms on different datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gentle Introduction to Naive Bayes algorithm. http://www.cs.columbia.edu/~evs/ml/OthelloStudProj/huang/write-up.html
Lewis DD, Ringutte M (1994) A comparison of two learning algorithms for text categorization. In: Third annual symposium on document analysis and information retrieval, Las Vegas, NV, pp 81–93
Hull D, Pedersen J, Schutze H (1996) Document routing as statistical classification. In: AAAI Spring symposium on machine learning in information access technical papers, Palo Alto
Weiss S, Kasif S, Brill E (1996) Text classification in USENET newsgroup: a progress report. In: AAAI Spring symposium on machine learning in information access technical papers, Palo Alto
Schutze H, Hull D, Pedersen J (1995) A comparison of classifiers and document representations for the routing problem. In: Proceedings of SIGIR, pp 229–237
Pazzani M, Muramatsu J, Billsus D (1996) Syskill and webert: identifying interesting web sites. In: AAAI Spring symposium on machine learning in information access technical papers, Palo Alto
Taruna S, Pandey M (2014) An empirical analysis of classification techniques for predicting academic performance. In: IEEE international advance computing conference (IACC)
Tan S, Zhang J (2008) An empirical study of sentiment analysis for chinese documents. Expert Syst Appl 2622–2629
Rasjida ZE, Setiawana R (2017) Performance comparison and optimization of text document classification using k-nn and naïve bayes classification techniques. In: 2017 2nd international conference on computer science and computational intelligence ICCSCI, 13–14 Oct 2017, Bali, Indonesia
Core Team R (2015) A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. https://www.R-project.org/
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Kaufmann M
Wu X, Kumar V (2009) The top ten algorithms in data mining. Data mining and knowledge discovery. Chapman & Hall/CRC, CRC Press
Saha D (2011) Web text classification using a neural network. In: Second international conference on emerging applications of information technology
Ali S, Smith KA (2006) On learning algorithm selection for classification. Appl Soft Comput 6:119–138
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bramesh, S.M., Anil Kumar, K.M. (2019). Empirical Study to Evaluate the Performance of Classification Algorithms on Public Datasets. In: Sridhar, V., Padma, M., Rao, K. (eds) Emerging Research in Electronics, Computer Science and Technology. Lecture Notes in Electrical Engineering, vol 545. Springer, Singapore. https://doi.org/10.1007/978-981-13-5802-9_41
Download citation
DOI: https://doi.org/10.1007/978-981-13-5802-9_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5801-2
Online ISBN: 978-981-13-5802-9
eBook Packages: EngineeringEngineering (R0)