Advertisement

Self-Organising Maps in Document Classification: A Comparison with Six Machine Learning Methods

  • Jyri Saarikoski
  • Jorma Laurikkala
  • Kalervo Järvelin
  • Martti Juhola
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6593)

Abstract

This paper focuses on the use of self-organising maps, also known as Kohonen maps, for the classification task of text documents. The aim is to effectively and automatically classify documents to separate classes based on their topics. The classification with self-organising map was tested with three data sets and the results were then compared to those of six well known baseline methods: k-means clustering, Ward’s clustering, k nearest neighbour searching, discriminant analysis, Naïve Bayes classifier and classification tree. The self-organising map proved to be yielding the highest accuracies of tested unsupervised methods in classification of the Reuters news collection and the Spanish CLEF 2003 news collection, and comparable accuracies against some of the supervised methods in all three data sets.

Keywords

machine learning neural networks self-organising map document classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apte, C., Damerau, F.J., Weiss, S.M.: Automated learning of decision rules for text categorization. ACM Transactions on Information Systems 12, 233–251 (1994)CrossRefGoogle Scholar
  2. 2.
    ChandraShekar, B.H., Shobha, G.: Classification of Documents Using Kohonen’s Self-Organizing Map. International Journal of Computer Theory and Engineering 5(1), 610–613 (2009)CrossRefGoogle Scholar
  3. 3.
    Chen, Y., Qin, B., Liu, T., Liu, Y., Li, S.: The Comparison of SOM and K-means for Text Clustering. Computer and Information Science 2(3), 268–274 (2010)Google Scholar
  4. 4.
    Chowdhury, N., Saha, D.: Unsupervised text classification using kohonen’s self organizing network. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 715–718. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Chumwatana, T., Wong, K., Xie, H.: A SOM-Based Document Clustering Using Frequent Max Substring for Non-Segmented Texts. Journal of Intelligent Learning Systems & Applications 2, 117–125 (2010)CrossRefGoogle Scholar
  6. 6.
    CLEF: The Cross-Language Evaluation Forum, http://www.clef-campaign.org/
  7. 7.
    Conover, W.J.: Practical Nonparametric Statistics. John Wiley & Sons, New York (1999)Google Scholar
  8. 8.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)zbMATHGoogle Scholar
  9. 9.
    Eyassu, S., Gambäck, B.: Classifying Amharic News Text Using Self-Organizing Maps. Proceeding of the ACL Workshop on Computational Approaches to Semitic Languages, Ann Arbor, Michigan, USA, pp. 71–78 (2005)Google Scholar
  10. 10.
    Fernandez, J., Mones, R., Diaz, I., Ranilla, J., Combarro, E.: Experiments with Self Organizing Maps in CLEF 2003. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 358–366. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Guerro-Bote, V.P., Moya-Anegón, F., Herrero-Solana, V.: Document organization using Kohonen’s algorithm. Information Processing and Management 38, 79–89 (2002)CrossRefzbMATHGoogle Scholar
  12. 12.
    Honkela, T.: Self-Organizing Maps in Natural Language Processing, Academic Dissertation. Helsinki University of Technology, Finland (1997)Google Scholar
  13. 13.
    Kohonen, T.: Self-Organizing Maps. Springer, Berlin (1995)CrossRefzbMATHGoogle Scholar
  14. 14.
    Lagus, K.: Text retrieval using self-organized document maps. Neural Processing Letters 15, 21–29 (2002)CrossRefzbMATHGoogle Scholar
  15. 15.
    Lagus, K., Kaski, S., Kohonen, T.: Mining massive document collections by the WEBSOM method. Information Sciences 163(1-3), 135–156 (2004)CrossRefGoogle Scholar
  16. 16.
    Moya-Anegón, F., Herrero-Solana, V., Jiménez-Contreras, E.: A connectionist and multivariate approach to science maps: the SOM, clustering and MDS applied to library and information science research. Journal of Information Science 32(1), 63–77 (2006)CrossRefGoogle Scholar
  17. 17.
  18. 18.
    Saarikoski, J., Laurikkala, J., Järvelin, K., Juhola, M.: A study of the use of self-organising maps in information retrieval. Journal of Documentation 65(2), 304–322 (2009)CrossRefGoogle Scholar
  19. 19.
    Saarikoski, J., Järvelin, K., Laurikkala, J., Juhola, M.: On Document Classification with Self-Organising Maps. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds.) ICANNGA 2009. LNCS, vol. 5495, pp. 140–149. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  20. 20.
    Salton, G.: Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)Google Scholar
  21. 21.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  22. 22.
  23. 23.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jyri Saarikoski
    • 1
  • Jorma Laurikkala
    • 1
  • Kalervo Järvelin
    • 2
  • Martti Juhola
    • 1
  1. 1.Department of Computer SciencesUniversity of TampereFinland
  2. 2.Department of Information Studies and Interactive MediaUniversity of TampereFinland

Personalised recommendations