Self-Organising Maps in Document Classification: A Comparison with Six Machine Learning Methods
This paper focuses on the use of self-organising maps, also known as Kohonen maps, for the classification task of text documents. The aim is to effectively and automatically classify documents to separate classes based on their topics. The classification with self-organising map was tested with three data sets and the results were then compared to those of six well known baseline methods: k-means clustering, Ward’s clustering, k nearest neighbour searching, discriminant analysis, Naïve Bayes classifier and classification tree. The self-organising map proved to be yielding the highest accuracies of tested unsupervised methods in classification of the Reuters news collection and the Spanish CLEF 2003 news collection, and comparable accuracies against some of the supervised methods in all three data sets.
Keywordsmachine learning neural networks self-organising map document classification
Unable to display preview. Download preview PDF.
- 3.Chen, Y., Qin, B., Liu, T., Liu, Y., Li, S.: The Comparison of SOM and K-means for Text Clustering. Computer and Information Science 2(3), 268–274 (2010)Google Scholar
- 6.CLEF: The Cross-Language Evaluation Forum, http://www.clef-campaign.org/
- 7.Conover, W.J.: Practical Nonparametric Statistics. John Wiley & Sons, New York (1999)Google Scholar
- 9.Eyassu, S., Gambäck, B.: Classifying Amharic News Text Using Self-Organizing Maps. Proceeding of the ACL Workshop on Computational Approaches to Semitic Languages, Ann Arbor, Michigan, USA, pp. 71–78 (2005)Google Scholar
- 12.Honkela, T.: Self-Organizing Maps in Natural Language Processing, Academic Dissertation. Helsinki University of Technology, Finland (1997)Google Scholar
- 17.Reuters-21578 collection, http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html
- 20.Salton, G.: Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)Google Scholar
- 23.20 newsgroups collection, http://people.csail.mit.edu/jrennie/20Newsgroups/